NAVAL 

POSTGRADUATE 

SCHOOL 

MONTEREY,  CALIFORNIA 


THESIS 


INTELLIGENT- AGENT-BASED  MANAGEMENT  OF 
HETEROGENEOUS  NETWORKS  FOR  THE  ARMY 
ENTERPRISE 

by 

Clyde  E.  Richards  Jr. 

September  2003 

Thesis  Advisor:  Alex  Bortdetsky 

Second  Reader:  James  O’Donnell 


Approved  for  public  release;  distribution  is  unlimited 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


J  REPORT  DOCUMENTATION  PAGE 

Form  Approved  OMB  No.  0704-0188  [ 

Public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including 
the  time  for  reviewing  instruction,  searching  existing  data  sources,  gathering  and  maintaining  the  data  needed,  and 
completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any 
other  aspect  of  this  collection  of  information,  including  suggestions  for  reducing  this  burden,  to  Washington 
headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite 
1204,  Arlington,  VA  22202-4302,  and  to  the  Office  of  Management  and  Budget,  Paperwork  Reduction  Project 
(0704-0188)  Washington  DC  20503. 

1.  AGENCY  USE  ONLY  (Leave  blank)  2.  REPORT  DATE  3.  REPORT  TYPE  AND  DATES  COVERED 

September  2003  Master’s  Thesis 

4.  TITLE  AND  SUBTITLE: 

ntelligent-Agent-Based  Management  of  Heterogeneous  Networks  for  the  Army 
interprise 

5.  FUNDING  NUMBERS 

6.  AUTHOR(S)  Clyde  E.  Richards 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Naval  Postgraduate  School 

Monterey,  CA  93943-5000 

8.  PERFORMING 

ORGANIZATION  REPORT 
NUMBER 

9.  SPONSORING  /MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

N/A 

10.  SPONSORING/MONITORING 
AGENCY  REPORT  NUMBER 

11.  SUPPLEMENTARY  NOTES  The  views  expressed  in  this  thesis  are  those  of  the  author  and  do  not  reflect  the  official 
policy  or  position  of  the  Department  of  Defense  or  the  U.S.  Government. 

12a.  DISTRIBUTION  /  AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  is  unlimited 

12b.  DISTRIBUTION  CODE 

13.  ABSTRACT  (maximum  200  words) 

The  Army  is  undergoing  a  major  realignment  in  accordance  with  the  Joint  Vision  2010/2020  transformation  to 
establish  an  enterprise  command  that  is  the  single  authority  to  operate  and  manage  the  Army  Enterprise 
Information  Infrastructure  (Infostructure).  However,  there  are  a  number  of  critical  network  management  issues 
that  the  Army  will  have  to  overcome  before  attaining  the  full  capabilities  to  manage  the  full  spectrum  of  Army 
networks  at  the  enterprise  level.  The  Army  network  environment  consists  of  an  excessive  number  of 
heterogeneous  applications,  systems,  and  network  architectures  that  are  incompatible.  There  are  a  number  of 
legacy  systems  and  proprietary  platforms.  Most  of  the  NM  architectures  in  the  Army  are  based  on  traditional 
centralized  NM  approaches  such  as  the  Simple  Network  Management  Protocol  (SNMP).  Although  SNMP  is  the 
most  pervasive  protocol,  it  lacks  the  scalability,  reliability,  flexibility  and  adaptability  necessary  to  effectively 
support  an  enterprise  network  as  large  and  complex  as  the  Army.  Attempting  to  scale  these  technologies  to  this 
magnitude  can  be  extremely  difficult  and  very  costly.  This  thesis  makes  the  argument  that  intelligent-agent-based 
technologies  are  a  leading  solution,  among  the  other  current  technologies,  to  achieve  the  Army’s  enterprise 
network  management  goals. 

14.  SUBJECT  TERMS 

Intelligent  Agent,  SNMP,  Enterprise  Network  Management,  CoABS,  Army  Enterprise 
Infostructure,  Global  Information  Grid 

15.  NUMBER  OF 
PAGES 

141 

16.  PRICE  CODE 

17.  SECURITY 
CLASSIFICATION  OF 

REPORT 

Unclassified 

18.  SECURITY 

CLASSIFICATION  OF  THIS 
PAGE 

Unclassified 

19.  SECURITY 
CLASSIFICATION  OF 
ABSTRACT 

Unclassified 

20.  LIMITATION 

OF  ABSTRACT 

UL 

NSN  7540-0 1  -280-5500  Standard  Form  298  (Rev.  2-89) 


Prescribed  by  ANSI  Std.  239-18 


1 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


11 


Approved  for  public  release,  distribution  is  unlimited. 


INTELLIGENT- AGENT-BASED  MANAGEMENT  OF  HETEROGENEOUS 
NETWORKS  FOR  THE  ARMY  ENTERPRISE 

Clyde  E.  Richards  Jr. 

Major,  United  States  Army 
B.A.,  Rutgers  University,  1989 


Submitted  in  partial  fulfillment  of  the 
requirements  for  the  degree  of 


MASTER  OF  SCIENCE  IN  INFORMATION  TECHNOLOGY  MANAGEMENT 


from  the 


NAVAL  POSTGRADUATE  SCHOOL 
September  2003 


Author:  Clyde  E.  Richards  Jr. 


Approved  by:  Dr.  Alex  Bordetsky 

Thesis  Advisor 


Mr.  James  O’Donnell 
Second  Reader/Co- Advisor 


Dr.  Dan  Boger 

Chairman,  Information  Systems  Academic  Group 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


IV 


ABSTRACT 


The  Army  is  undergoing  a  major  realignment  in  accordance  with  the  Joint  Vision 
2010/2020  transfonnation  to  establish  an  enterprise  command  that  is  the  single  authority 
to  operate  and  manage  the  Army  Enterprise  Information  Infrastructure  (Infostructure). 
However,  there  are  a  number  of  critical  network  management  issues  that  the  Anny  will 
have  to  overcome  before  attaining  the  full  capabilities  to  manage  the  full  spectrum  of 
Anny  networks  at  the  enterprise  level.  The  Anny  network  environment  consists  of  an 
excessive  number  of  heterogeneous  applications,  systems,  and  network  architectures  that 
are  incompatible.  There  are  a  number  of  legacy  systems  and  proprietary  platforms.  Most 
of  the  NM  architectures  in  the  Army  are  based  on  traditional  centralized  NM  approaches 
such  as  the  Simple  Network  Management  Protocol  (SNMP).  Although  SNMP  is  the  most 
pervasive  protocol,  it  lacks  the  scalability,  reliability,  flexibility  and  adaptability 
necessary  to  effectively  support  an  enterprise  network  as  large  and  complex  as  the  Anny. 
Attempting  to  scale  these  technologies  to  this  magnitude  can  be  extremely  difficult  and 
very  costly.  This  thesis  makes  the  argument  that  intelligent-agent-based  technologies  are 
a  leading  solution,  among  the  other  current  technologies,  to  achieve  the  Army’s 
enterprise  network  management  goals. 
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I.  INTRODUCTION 


...the  network  will  become  a  weapon  system  and  should  have  a  command 
relationship  commensurate  with  that  of  normal  operational  forces... 

-  BG  (P)  James  D.  Bryan,  USA  Commander,  JTF-CND 


A.  DISCUSSION 

The  thesis  of  this  paper  is  the  argument  that  intelligent-agent-based  technologies 
are  a  leading  solution,  among  other  current  technologies,  to  achieve  the  Army’s 
enterprise  network  management  goals.  The  Army  is  undergoing  a  major  realignment  in 
accordance  with  the  Joint  Vision  2010/2020  transformation  to  establish  an  enterprise 
command  that  is  the  single  authority  to  operate  and  manage  the  Army  Enterprise 
Information  Infrastructure  (Infostructure).  However,  there  are  a  number  of  critical 
network  management  issues  that  the  Army  will  have  to  overcome  before  attaining  the  full 
capabilities  to  manage  the  full  spectrum  of  Army  networks  at  the  enterprise  level.  Over 
the  years  the  Army  information  infrastructure  had  evolved  into  a  number  of  stovepiped 
networks,  contemporary  and  legacy  systems,  and  heterogeneous  applications  due  to  the 
lack  of  centralized  configuration  management  and  control. 

The  Department  of  Defense’s  effort  to  enable  the  overarching  JV  2010/2020 
concepts  is  the  driving  force  behind  the  Army’s  need  to  establish  a  single  enterprise-level 
network  management  architecture.  JV  2010/2020  envisions  the  development  of  a 
superior  joint  force  that  is  capable  of  achieving  full  spectrum  dominance  across  the  range 
of  military  operations.  The  pathway  to  full  spectrum  dominance  is  the  underlying  layered 
concepts  of  decision  superiority,  information  superiority,  network-centric  warfare 
(NCW),  and  the  global  information  grid  (GIG).  Each  of  the  respective  concepts,  starting 
with  the  GIG,  provides  a  distinctive  capability  and  is  the  foundation  that  supports  the 
preceding  layer.  These  individual  layers  are  the  building  blocks  that  enable  full  spectrum 
dominance. 

Information  superiority  is  the  key  enabler  to  achieve  decision  superiority  for  the 

warfighter,  which  ultimately  leads  to  full  spectrum  dominance.  Further,  information 

superiority  is  enabled  by  the  NCW  concept,  which  requires  the  aggregation  and 
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interoperability  of  the  stovepiped  networks,  legacy  systems,  and  applications.  The  GIG 
is  the  underlying  infrastructure  that  supports  NCW.  The  Army’s  portion  of  the  GIG  is 
called  the  Army  Enterprise  Infostructure  (AEI).  As  can  be  seen,  network  management 
(NM)  plays  a  vital  role  to  realizing  the  JV20 10/2020  concept.  Through  effective  NM, 
networks  must  provide  the  necessary  bandwidth  availability,  reliability,  and  quality  of 
service  for  information  exchange  in  an  accurate,  timely,  and  secure  fashion. 

There  are  a  number  of  obstacles  that  the  Army  will  have  to  overcome  before 
achieving  an  effective  enterprise  NM  solution.  The  Army  network  environment  consists 
of  an  excessive  number  of  heterogeneous  applications,  systems,  and  network 
architectures  that  are  incompatible.  There  are  a  number  of  legacy  systems  that  hinder 
interoperability.  There  are  a  number  of  proprietary  platforms,  including  NM  platforms. 
The  NM  platforms  are  based  on  different  protocols  and  standards.  Most  of  the  NM 
architectures  are  based  on  traditional  centralized  NM  approaches  such  as  the  Simple 
Network  Management  Protocol  (SNMP),  and  the  Common  Management  Infonnation 
Protocol  (CMIP).  Although  SNMP  and  CMIP  are  the  most  pervasive  protocols,  these 
standards  apply  agent  technology  in  a  very  primitive  way.  Although  advancements  were 
made,  such  as  in  SNMP  version  3  (SNMPv3),  these  protocols  still  lack  the  scalability, 
reliability,  and  adaptability  necessary  to  effectively  support  an  enterprise  network  as  large 
and  complex  as  the  Anny.  Attempting  to  scale  these  technologies  to  this  magnitude  can 
be  extremely  difficult  and  very  costly.  This  leads  to  the  main  research  question:  what 
alternative  technologies  can  scale  and  meet  the  Army  Enterprise  Infostructure  network 
management  and  situational  awareness  requirements? 

This  study  proposes  that  intelligent  agent  technologies  are  a  future  leading 
solution  to  address  the  aforementioned  problems.  Although  agent  technologies  solutions 
for  network  management  are  fairly  immature,  there  are  a  number  of  studies  that  indicate 
they  are  promising.  Agent-based  technology  has  the  capability  to  distribute  intelligence 
throughout  the  network  and  dynamically  perfonn  NM  functions  on  an  as  needed  basis. 
This  provides  efficiency  and  flexibility,  and  dramatically  cuts  down  on  bandwidth 
constriction  and  overloading  on  a  single,  central  processor.  Agents  can  be  added  to  any 
agent  environment  “on  the  fly,”  and  because  of  their  small  size  can  scale  well. 
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Considering  these  properties,  as  well  as  others,  makes  intelligent-agent-based 
technologies  an  ideal  solution  for  the  AEI  network  management  requirement. 

B.  RESEARCH  QUESTIONS 

1.  What  alternative  technologies  can  scale  and  meet  the  Anny  Enterprise 
Infostructure  (AEI)  network  management  and  situational  awareness  requirements? 

Sub-research  questions: 

a.  How  can  intelligent-agent-based  technologies  be  used  to  establish  network 
management  control  and  network  situational  awareness  of  the  AEI? 

b.  How  can  intelligent-agent-based  technologies  be  used  to  execute  Fault, 
Configuration,  Accounting,  Performance,  and  Security  (FCAPS) 
management  or  establish  a  network  common  operational  picture 
(NETCOP)? 

c.  How  does  intelligent-agent-based  technology  for  enterprise  network 
management  compare  to  SNMP-based  and  other  distributed  management 
technologies? 

d.  Can  an  intelligent-agent-based  network  management  architecture  scale  to 
support  the  AEI? 

e.  How  can  the  Control  of  Agent  Based  Systems  (CoABS)  be  leveraged  to 
support  an  intelligent-agent-based  enterprise-level  network  management 
architecture  for  the  AEI? 

C.  SCOPE 

The  scope  of  this  thesis  covers  why  intelligent-agent-based  systems  are  very  well 
suited  to  meet  the  Anny’s  AEI  network  management  requirements  and  how  an 
intelligent-agent-based  system  can  be  applied  to  the  AEI  to  achieve  enterprise  network 
management  and  network  situational  awareness.  This  thesis  looks  at  the  capabilities 
required  for  enterprise  network  management  and  the  shortcomings  that  the  Anny  will 
have  to  overcome  based  on  the  disposition  of  the  current  systems  and  networks.  The 
study  addresses  some  of  the  problems  that  traditional  and  distributed  network 
management  protocols  pose,  and  makes  the  argument  for  intelligent-agent-based  network 
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management  relative  to  these  problem  areas.  Finally,  the  thesis  presents  a  conceptual, 
high-level  intelligent-agent-based  network  management  design  based  on  current  research 
projects  such  as  the  Control  of  Agent  Based  Systems  (CoABS)  endeavor  sponsored  by 
DARPA. 

D.  METHODOLOGY 

The  nature  of  this  thesis  research  is  to  explore  the  numerous  intelligent-agent- 
based  architectures  that  are  currently  under  study  and  use  this  research  to  make  the 
argument  for  an  intelligent-agent-based  solution  as  opposed  to  traditional  and  other 
methods.  Due  to  the  fact  that  this  field  of  research  is  relatively  immature,  there  are  no 
full  scale  intelligent-agent-based  architectures  that  have  yet  to  be  applied  to  an  enterprise 
network  management  situation.  Hence,  this  study  draws  on  the  theory,  experimentation 
and  findings  of  the  various  architectures  that  have  been  published.  The  study  looks  at  the 
lineage  (i.e.  the  new  warfighting  concepts)  that  has  lead  up  to  the  enterprise  network 
management  issues  and  breakdowns  the  overarching  goals  to  the  specific  needs  for  the 
AEI.  This  thesis  investigates  the  Army’s  approach  to  establish  enterprise  network 
management  and  a  network  common  operational  picture.  The  thesis  culminates  with  the 
presentation  of  a  conceptual,  high-level  design  that  is  based  on  techniques  and  designs 
that  are  currently  being  developed. 

E.  BENEFITS  OF  THE  STUDY 

This  thesis  research  is  directly  applicable  to  the  military’s  effort  to  manage  the 
envisioned  enterprise-level  network  grids.  The  research  will  generate  thought  provoking 
ideas  and  present  areas  of  concern  that  must  be  addressed  in  order  to  fully  achieve  the 
JV20 10/2020  vision.  The  Army  and  the  other  services  can  benefit  from  this  research  by 
considering  the  complications,  brought  out  in  the  thesis,  which  will  eventually  be 
encountered  with  the  implementation  of  current  technologies.  By  considering  the 
benefits  of  intelligent-agent-based  technologies  elaborated  in  the  study,  the  Army  can 
seek  the  technology  as  a  future  alternative  that  can  mitigate  the  shortcomings  of  current 
technologies  and  provide  a  cost  effective  solution  for  the  AEI. 

This  research  is  sponsored  by  the  Army  Information  Systems  Engineering 
Command  (ISEC),  located  at  Fort  Huachuca,  Arizona.  ISEC  is  currently  pursuing 
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solutions  to  support  enterprise  network  management  and  network  situational  awareness. 
This  thesis  research  will  aid  ISEC  in  their  pursuit  for  the  best  solution. 
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II. 


THE  ENTERPRISE  NETWORK  MANAGEMENT  DILEMMA 


A.  NETWORK  MANAGEMENT  OVERVIEW 

The  Simple  Network  Management  Protocol  (SNMP)  is  the  most  pervasive  and 
commonly  accepted  and  implemented  network  management  (NM)  standard  today.  The 
Department  of  Defense’s  (DoD’s)  Joint  Technical  Architecture  (JTA)  document 
mandates  SNMP  as  the  data  communications  management  standard  within  the  DoD. 
With  the  increasing  size,  management  complexities,  and  service  requirement  of  today’s 
networks,  the  limitations  of  classic  agent-manager  paradigms,  such  as  SNMP,  are 
inadequate  to  achieve  the  order  of  magnitude  demands  required  in  large  organizations 
such  as  the  DoD. 

In  the  pursuit  of  achieving  a  single  enterprise-level  network  across  the  military 
services,  the  DoD  and  the  services  will  discover  inherent  limitations  of  the  modem  day 
network  management  protocols.  This  section  reviews  the  network  management  concepts 
and  the  problems  and  issues  that  it  poses  for  enterprise  network  management 
implementation  envisioned  within  DoD. 

1.  Enterprise  Networks 

Enterprise  networks  are  typically  a  conglomeration  of  the  various  sub-networks 
within  an  organization.  These  networks  are  large  and  geographically  dispersed.  They 
consist  of  many  legacy  and  modem  devices;  some  that  are  critical  to  the  network 
operation  itself  while  others  are  essential  for  the  services  provided.  Configuring, 
managing,  and  monitoring  enterprise  networks  are  a  monumental  task  that  requires  the 
requisite  network  management  tools  and  management  expertise. 

Generally,  enterprise  networks  are  owned  by  a  single  organization,  such  as  IBM, 
federal  government  bodies,  and  financial  institutions.  These  networks  exist  to  provide 
data  and  telecommunications  services  to  employees,  customers,  and  suppliers.  Services 
can  include  [28]: 

•  File  and  data  storage 

•  Print 


•  Email 
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•  Access  to  shared  applications 

•  Internet  access 

•  Intranet 

•  Extranet 

•  E-commerce 

•  Dial  tone 

•  International  desk-to-desk  dialing  (using  voice-over-TDM  or 
voice-over-IP) 

•  Video 

•  LAN  and  virtual  LAN  (VLAN) — often  heavily  over-engineered 
(more  bandwidth  than  necessary)  to  avoid  congestion 

•  Corporate  WAN — can  be  used  for  data  and  also  voice-over-IP 

•  Virtual  private  network  (VPN) — can  be  used  for  securely  joining 
multiple  sites  and  remote  workers  and  replacing  expensive  leased 
lines 

•  Disaster  recovery — maintaining  network  service  after  some 
cataclysmic  event 

Enterprise  networks  achieve  these  and  other  services  by  deploying  a  wide  variety 
of  different  technologies  and  systems. 

Ligure  1  depicts  a  typical  simplified  enterprise  network.  As  can  be  seen,  an 
enterprise  network  encompasses  several  functional  services  such  as  voice,  message, 
network  storage,  and  application  services.  Each  of  the  various  services  supports  a 
number  of  user  specific  functions  such  as  email,  desk  phones,  internet  access,  etc.  The 
connected  boxes  in  the  figure  provide  access  to  the  services.  Large  networks  such  as  this 
can  serve  large  geographically  distributed  corporate  users  that  span  over  hundreds  of 
remote  branch  offices  [28]. 
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Figure  1.  Enterprise  network  functional  components 


Enterprise  data  flows  can  become  very  complex  once  extranets  and  e-commerce 
are  employed.  Extranets  are  parts  of  intranets  that  are  extended  to  organizations  external 
to  the  enterprise,  such  as  software  contractors.  E-commerce  allows  for  secure  financial 
transactions  between  external  customers  and  a  given  organization.  The  data  flows  in  the 
latter  case  feed  into  various  systems,  such  as  finance,  stock  control,  and  manufacturing. 

It  is  apparent  that  supporting  a  vast  enterprise  network  across  functional  systems 
in  a  heterogeneous  environment  call  for  a  powerful  underlying  network.  Following  are 
some  general  features  of  enterprise  networks  [28]: 

•  They  incorporate  a  wide  range  of  multi-vendor  devices,  such  as 
routers,  switches,  exchanges,  PCs,  servers,  printers,  terminal 
servers,  digital  cross-connects,  multiplexers,  storage  devices, 

Voice  over  IP  (VoIP)  telephones,  servers,  and  firewalls. 

•  Network  elements  (NEs)  can  incorporate  other  intelligent  devices, 
such  as  PCs  with  network  interface  cards  (NICs)  and  possibly 
modems.  Likewise,  desk  phones  can  contain  computer-telephony 
integration  (CTI)  hardware  for  applications  like  call  centers  and  e- 
commerce  bureaus. 

•  Individual  NEs  provide  a  variety  of  different  shared  services;  for 
example,  a  legacy  PABX  or  a  soft  switch  provides  basic  telephony 
and  can  form  the  foundation  of  a  call  center.  In  this  way,  a  base 
system  is  leveraged  to  provide  another  system  or  service. 

•  Backup  and  restore  of  NE  firmware  are  important  for  rolling  out 
new  network  services. 
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•  Specialized  servers  are  deployed  to  provide  advanced  services  such 
as  Storage  Area  Networks  (SANs). 

•  Many  users  are  supported  simultaneously. 

•  The  overall  network  services,  such  as  email  and  video/audio 
conferencing,  are  used  by  employees  of  the  organization  as 
essential  business  process  components. 

The  features  and  complexities  in  enterprise  networks  described  above  exemplify 
the  massively  intertwined  networks  and  services  that  exist  in  within  the  DoD  and  the 
military  services.  For  example,  within  the  DoD  the  Defense  Information  Systems 
Network  (DISN)  is  the  key  wide-area  communications  component  that  aggregates  a 
multitude  of  sub-networks  across  the  enterprise.  Some  of  the  DISN  networks  are  listed 
below: 

•  Defense  Red  Switch  Network  for  classified  voice  conferencing 

•  Secret  Internet  Protocol  Router  Network  (SIPRNET) 

•  Non-Secure  Internet  Protocol  Router  Network  (NIPRNET) 

•  Enhanced  Mobile  Satellite  Services  (EMSS) 

•  DISN  Voice  Communication  Systems  (DSCS) 

•  Defense  Switched  Network  (DSN-voice  traffic) 

•  Defense  Message  System  (DMS) 

There  are  also  a  host  of  service  independent  enterprise  networks  that  connect  to 
the  DISN  for  wide-area  transport  and  support.  For  example,  in  the  Army  you  have  the 
Army  Enterprise  Infostructure  (AEI)  that  is  under  development,  the  Warfighter 
Information  Network  -  Tactical  (WIN-T)  and  the  Common  User  Installation  Transport 
Network.  Another  example  is  the  Navy’s  Navy  Marine  Corps  Intranet  (NMCI)  initiative. 

2.  Network  Management 

In  the  late  1970s  and  early  1980s,  networking  in  organizations  began  to  thrive. 
With  the  growing  size  of  the  networks  and  rising  number  of  network  platforms  and 
devices,  many  disparate  network  management  solutions  worked  there  way  into  these 
organizations.  Many  organizations  experienced  a  myriad  of  network  management 
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problems  due  to  issues  such  as  multi-vendor  interoperability,  lack  of  expertise,  and 
reliability.  There  were  a  few  generic  tools  available  for  managing  networks;  the  more 
sophisticated  management  tools  available  were  typically  proprietary  which  limited  there 
use.  Most  available  management  tools  were  used  in  an  ad-hoc  fashion. 

As  the  networks  grew  in  size  and  proliferated,  they  became  even  more  complex  to 
manage.  Network  administrators  and  operators  had  to  become  adept  at  handling  the 
many  ambiguous  anomalies  that  surfaced.  The  increased  complexity  of  operations 
created  a  demand  for  common,  vendor-neutral,  interoperable,  and  integrated  solutions. 
As  a  result,  two  standards  emerged  in  the  late  1980s:  the  Common  Management 
Information  Protocol  (CMIP)  published  by  the  International  Organization  for 
Standardization  (ISO)  and  the  Simple  Network  Management  Protocol  (SNMP)  published 
by  the  Internet  Engineering  Task  Force  (IETF). 

Network  management  comprises  all  the  measures  necessary  to  ensure  effective 
and  efficient  operations  of  a  networked  system.  This  includes  the  deployment, 
integration  and  coordination  of  the  hardware,  software,  and  human  elements  to  monitor, 
test,  poll,  configure,  analyze,  evaluate,  and  control  the  network  and  element  resources  to 
meet  the  real-time,  operational  perfonnance,  and  quality  of  service  requirements  at  a 
reasonable  cost  [7]. 

The  goals  of  NM  are  to  provide  the  services  and  applications  of  a  networked 
system  with  the  desired  level  of  quality  and  to  guarantee  availability  and  rapid,  flexible 
deployment  of  networked  resources  [6].  This  includes  the  detection  and  handling  of 
faults,  performance  inefficiencies,  and  security  compromises.  To  accomplish  these  goals, 
management  applications  are  designed  to  do  the  following  [26]: 

•  Collect  real  time  data  from  network  elements,  such  as  routers, 
switches,  and  workstations.  For  example,  they  collect  the  number 
of  packets  handled  by  the  given  interface  of  a  router. 

•  Interpret  and  analyze  the  data  collected.  For  instance,  they  may 
recognize  security  events,  such  as  repeated  illegal  attempts  to  login 
on  a  workstation. 

•  Present  this  information  to  authorized  network  operators,  possibly 
by  displaying  a  map  of  current  traffic. 
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•  Proactively  react,  in  real  time,  to  management  problems,  possibly 
by  disabling  a  link  that  is  experiencing  faults. 

3.  Requirements  for  Network  Management 

There  are  a  number  of  requirements  that  drive  an  organization  to  incorporate  a 
NM  system.  Below  William  Stallings  [8]  provides  insight  on  organizational 
requirements  that  justify  an  investment  in  NM: 

•  Controlling  corporate  strategic  assets :  Networks  and  distributed 
computing  resources  are  increasingly  vital  resources  for  most 
organizations.  Without  effective  control,  these  resources  do  not 
provide  the  payback  that  corporate  management  requires. 

•  Controlling  complexity.  The  continued  growth  in  the  number  of 
network  components,  end  users,  interfaces,  protocols,  and  vendors 
threatens  management  with  loss  of  control  over  what  is  connected 
to  the  network  and  how  network  resources  are  used. 

•  Improving  sendee :  End  users  expect  the  same  or  improved  service 
as  the  infonnation  and  computing  resources  of  the  organization 
grow  and  distribute. 

•  Balancing  various  needs :  The  information  and  computing 

resources  of  an  organization  must  provide  a  spectrum  of  end  users 
with  various  applications  at  given  levels  of  support,  with  specific 
requirements  in  the  areas  of  performance,  availability,  and 
security.  The  network  manager  must  assign  and  control  resources 
to  balance  these  various  needs. 

•  Reducing  downtime :  As  the  network  resources  of  an  organization 
become  more  important,  minimum  availability  requirements 
approach  100  percent.  In  addition  to  proper  redundant  design, 
network  management  has  an  indispensable  role  to  play  in  ensuring 
high  availability  of  its  resources. 

•  Controlling  costs :  Resources  utilization  must  be  monitored  and 
controlled  to  enable  essential  end-user  needs  to  be  satisfied  with 
reasonable  cost. 

These  requirements  are  not  only  critical  for  the  private  sector  enterprises,  but  also 

they  are  critical  for  military  organizations  to  carry  out  their  missions.  In  fact,  these 

requirements  are  vital  enablers  during  a  wartime  situation  in  order  for  combat  units  to 

communicate  within  an  organization,  across  the  services,  with  coalition  forces, 

commercial  supporters,  and  CONUS  sustaining-base  organizations.  With  military 
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migrating  towards  network-centric  operations  and  fielding  increasing  numbers  of 
network  dependent  weapon  systems,  the  slightest  failure  in  the  network  could  mean 
disaster  on  the  battlefield.  Therefore,  NM  in  the  military  must  be  reliable,  efficient  and 
effective  from  the  enterprise-level  on  down. 

4.  Network  Management  Functions 

The  International  Organization  for  Standardization  Network  Management  Forum 
has  divided  network  management  into  five  functional  areas  that  are  recognized  and  used 
as  a  baseline  within  the  industry:  Fault  Management,  Configuration  Management, 
Accounting  Management,  Perfonnance  Management,  and  Security  Management 
(FCAPS).  The  FCAPS  principles  are  further  elaborated  below  [9]: 

a.  Fault  Management 

Fault  management  is  the  process  of  detecting  and  correcting  network 
problems,  otherwise  known  as  faults.  Faults  typically  manifest  themselves  as 
transmission  errors  or  failures  in  the  equipment  or  interface.  Faults  result  in  unexpected 
downtime,  perfonnance  degradation  and  loss  of  data.  Generally,  fault  conditions  need  to 
be  resolved  as  quickly  as  possible. 

b.  Configuration  Management 

The  configuration  management  functions  detect  and  control  the  state  of 
the  network  resources.  This  entails  the  initialization,  modification,  and  shutdown  of  a 
network.  Networks  are  continually  adjusted  when  devices  are  added,  removed, 
reconfigured,  or  updated.  These  changes  may  be  intentional,  such  as  adding  a  new  server 
to  the  network,  or  path  related,  such  as  fiber  cut  between  two  devices  resulting  in  a 
rerouted  path.  If  a  network  is  to  be  turned  off,  then  a  graceful  shutdown  in  a  prescribed 
sequence  is  performed  as  part  of  the  configuration  management  process.  The  process  of 
configuration  management  involves  identifying  the  network  components  and  their 
connections,  collecting  each  device's  configuration  information,  and  defining  the 
relationship  between  network  components.  In  order  to  perfonn  these  tasks,  the  network 
manager  needs  topological  information  about  the  network,  device  configuration 
information,  and  control  of  the  network  component. 
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c.  Accounting  Management 

Accounting  management  functions  collect  and  process  resource 
consumption  data.  This  type  of  management  involves  monitoring  the  login  and  logoff 
records,  and  checking  the  network  usage.  This  is  done  to  determine  a  user's  use  of  the 
network  for  the  purposes  of  allocation  of  resources  and  billing  for  their  usage. 
Additionally,  this  type  of  information  helps  a  network  manager  allocate  the  right  kind  of 
resources  to  users,  as  well  as  plan  for  network  growth. 

d.  Performance  Management 

Perfonnance  management  involves  measuring  the  performance  of  a 
network  and  its  resources  in  terms  of  utilization,  throughput,  error  rates,  and  response 
times.  With  performance  management  information,  a  network  manager  can  reduce  or 
prevent  network  overcrowding  and  inaccessibility.  This  helps  provide  a  more  consistent 
level  of  service  to  users  on  the  network,  without  overtaxing  the  capacity  of  devices  and 
links.  This  form  of  management  looks  at  the  percentage  of  utilization  of  devices  and  error 
rates  to  help  in  improving  and  balancing  the  throughput  of  traffic  in  all  parts  of  a 
network.  Typically,  some  devices  are  more  highly  utilized  than  others.  Perfonnance 
monitoring  gives  qualitative  and  time  relevant  information  on  the  health  and  performance 
of  devices  so  that  underutilized  devices  are  more  fully  utilized  and  overtaxed  devices  are 
rebalanced. 

e.  Security  Management 

Security  management  deals  with  ensuring  overall  security  of  the  network, 
including  protecting  sensitive  infonnation  through  the  control  of  access  points  to  that 
information;  for  example,  blocking  unauthorized  access  to  database  records. 

5.  SNMP  Architecture  and  Functions 

As  mentioned  earlier,  the  CMIP  and  SNMP  standards  were  manifested  as  a  result 
of  the  great  demand  for  a  common,  vendor-neutral,  interoperable,  and  integrated  network 
management  standard.  The  SNMP  model  was  originally  an  interim,  rudimentary  solution 
to  resolve  the  growing  NM  problems.  The  CMIP  model  was  designed  to  be  more  robust 
and  provide  greater  NM  capabilities  that  would  eventually  replace  SNMP.  However,  this 
never  happened  because  the  CMIP  approach  was  found  to  be  too  complex  for  widespread 
adoption.  The  appeal  for  SNMP  was  its  small  size  (lightweight)  and  simplicity  (ease  of 
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implementation,  installation,  and  use).  SNMP  was  widely  accepted  globally  and  has  now 
become  the  de  facto  standard  for  network  management. 

This  section  is  intended  to  give  the  reader  a  fundamental  understanding  of  the 
SNMP  architecture  and  functions  in  order  to  understand  the  problems  with  framework 
discussed  later.  This  section  is  focused  on  SNMP,  as  opposed  to  other  standards,  because 
it  is  the  most  ubiquitous  and  widely  accepted  open  standard  today.  While  the  SNMP 
architecture  and  functionality  are  given  at  an  abstract  level,  a  more  detailed  technical 
review  is  provided  in  Appendix  I  for  a  better  understanding.  Additionally,  the  other  most 
common  standards  available  are  also  presented  in  Appendix  I. 

The  SNMP  architecture  is  a  centralized  hierarchical  design  based  on  the  client- 
server  paradigm  (see  Figure  2).  The  architecture  is  made  of  three  core  components: 
managers,  agents,  and  the  management  information  base  (MIB).  The  management  logic 
is  performed  on  a  central  station  (the  client)  called  the  management  entity  or  network 
management  station  (NMS).  The  NMS  is  an  “umbrella”  application  that  integrates  the 
user  interface  with  many  independent  management  applications  called  agents  (the 
server).  Agents  are  software  processes  embedded  on  each  managed  device  that  monitors, 
controls,  and  collects  data  from  the  devices.  The  management  data  is  stored  in  a  database 
that  resides  on  managed  devices  where  the  agents  can  retrieve  it.  This  database  is  known 
as  the  management  information  base  or  MIB.  MIBs  are  organized  as  static  directory 
trees  with  managed  data  stored  at  the  tree  leaves  [27].  The  tree  leaves  are  called  MIB 
objects. 
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Management  Entity 


Figure  2. 


SNMP  Architecture 


There  are  a  number  of  vendors  that  produce  network  management  software  and 
tools  on  the  market.  The  product  capabilities  vary  in  range  in  terms  of  the  level  of 
management  capability,  the  breadth  of  management  features  and  functionality.  Many 
products  are  modular,  allowing  customers  to  buy  only  the  necessary  features  that  meet 
their  NM  needs.  The  modules  collectively  make  up  a  suite  that  is  referred  to  as 
“frameworks”  in  the  industry.  The  more  narrowly  focused  products  are  known  as  “point” 
solutions.  Frameworks  tend  to  aggregate  the  data  from  point  solutions  to  provide  insights 
on  the  enterprise  network  as  a  whole  [34].  There  are  few  giant  developers  that  sell 
enterprise  level  suites  that  are  capable  of  centrally  managing  networks  at  the  enterprise 
level.  The  most  recognized  enterprise  NM  manufactures  today  are:  Hewlett-Packard, 
IBM,  Computer  Associates,  BMC  Software,  and  Aprisma. 

The  SNMPvl  design  consists  of  a  single  central  NMS,  as  shown  in  Fig  2. 
However,  the  SNMPv2  design  introduced  the  concept  of  intennediate  network 
management  stations.  This  design  made  it  possible  to  decentralize  the  management 
burden  by  sharing  the  processing  load  with  more  than  one  management  station  (see  Fig 
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3).  The  intermediate  NMSs  are  capable  of  sharing  information  with  one  another  and  with 
a  central  NMS  that  aggregates  all  the  management  information  for  display  on  a  single 
user  interface. 


Figure  3.  SNMPv2  architecture 


SNMP  is  a  polling-oriented  protocol  that  uses  a  fetch-store  paradigm  and  trap 
paradigm.  Fetch  is  initiated  by  the  NMS  to  retrieve  values  from  agents  and  monitor 
internal  data  values  and  data  structures  within  the  MIB.  Store  is  initiated  by  the  NMS  to 
change  values  on  agents  and  to  modify  and  control  data  values  and  data  structures  within 
the  MIB;  it  is  also  use  to  control  behavior  of  a  NE.  Trap  is  initiated  by  an  agent  to 
asynchronously  report  alarm  conditions  to  the  NMS  when  an  unexpected  event  occurs  on 
aNE. 

The  SNMP  protocol  defines  exactly  how  a  NMS  communicates  with  an  agent.  It 
specifies  the  message  formats,  called  Protocol  Data  Units  (PDUs),  that  are  used  by  the 
manager  and  agent  for  requests  and  responses.  It  also  defines  the  exact  meaning  of  the 
request  and  responses.  The  protocol  primarily  uses  UDP  for  transportation  to  keep  the 
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communications  simple  and  efficient,  however  it  can  use  other  transport  protocols  such 
as  TCP  or  HTTP.  Instead  of  defining  a  large  set  of  commands,  the  protocol  uses  the 
fetch-store  paradigm  discussed  earlier.  The  protocol  defines  the  syntax  for  nine 
management  messages:  get,  get-next,  get-bulk,  set,  get-response,  trap,  notification, 
inform,  and  report. 

The  SNMP  paradigm  establishes  a  “control  loop”  that  involves  the  collection  of 
monitoring  data  at  the  NE,  human  interpretation  and  analysis  of  the  computation  at  the 
NMS,  and  the  invocation  of  corrective  actions  at  the  NE  [27].  Figure  4  shows  how  the 
control  loop  stretches  from  the  managed  device  across  the  network  to  the  central  NMS. 
These  control  loops  are  subject  to  failure  when  the  network  experiences  problems  and 
outages. 


Control  Loop 


Figure  4.  SNMP  Control  Loop 


B.  NETWORK  MANAGEMENT  PROBLEMS  AND  CHALLENGES 

The  dimensions  and  complexities  of  today’s  large  networks  are  outstripping  the 
capabilities  to  manage  them  in  an  efficient  and  cost-effective  manner.  With  the  rapid 
pace  of  technology  advancements,  networks  have  constantly  grown  in  size  and  consist  of 
a  variety  of  heterogeneous  devices  and  legacy  systems.  These  grand  networks  have  a 
large  number  of  nodes  interconnected  by  heterogeneous  transmission  media  (e.g.  wired 
and  wireless)  and  operate  at  accelerated  speeds.  Managing  such  networks  has 
increasingly  become  more  difficult,  requiring  a  multitude  of  tools  to  centrally  manage  the 
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various  open  and  proprietary  network  components.  Often  you  will  find  a  variety  of 
multi-vendor  network  management  platforms  and  tools  being  used  to  manage  a  single 
enterprise  network.  This  is  inadequate  for  the  commercial  world,  as  well  as  for  the 
emerging  high-tech  military,  and  has  not  gone  without  notice. 

There  is  a  great  deal  of  research  and  study  ongoing  seeking  optimal  network 
management  solutions  for  today’s  sophisticated  network  schemes.  Many  researches  view 
the  SNMP  and  the  other  traditional  protocols  as  primitive  and  inadequate  to  meet  the 
demands  of  today’s  and  future  large-scale  networks.  This  section  examines,  discusses, 
and  summarizes  the  various  findings  under  study  regarding  the  limitations  and 
shortcomings  of  the  traditional  models  most  commonly  in  use  today. 

1.  Shortcomings  of  Centralized  Management  Approaches 
The  centralized  nature  of  client-server  network  management  paradigms,  such  as 
SNMP  and  CMIP,  is  a  major  limitation  of  traditional  architectures.  Gennan  S. 
Goldszmidt  conducted  an  elaborate  study  for  his  dissertation  on  “Management  by 
Delegation”  that  chronicles  the  many  shortcomings  of  the  traditional  models,  with  an 
emphasis  on  SNMP  because  of  its  ubiquity.  He  points  out  that  the  client-server  model  is 
too  rigid,  and  thus,  hinders  the  development  of  effective  management  systems  [27]: 

The  implementations  of  these  processes  are  statically  compiled  and  linked. 

A  client  process  in  a  manager  role  can  only  invoke  a  fixed  set  of 
predefined  services.  This  set  cannot  be  modified  or  expanded  without  the 
recompilation,  reinstallation,  and  reinstantiation  of  the  server  process. 

The  SNMP  framework,  for  instance,  was  written  based  on  the  assumption  that  network 
devices  had  limited  computing  resources  and  therefore  had  to  rely  on  a  central 
management  entity  to  perform  intelligent  processing.  The  next  several  paragraphs 
summarize  many  of  the  key  issues  about  the  problems  with  centralized  management. 
a.  Scalability  Issues 

In  their  study,  Puliafito  and  Tomarchio  [11]  state  that  “the  rapid  expansion 
of  networks  has  caused  scalability  problems  in  managing  a  larger  number  of  nodes...  the 
larger  number  of  nodes  requires  increased  polling  over  the  network,  and  causes  an 
increase  in  network  traffic  and  bandwidth.”  Goldszmidt  further  points  out  that  each  of 
these  node  interactions  involves  retrieving  and  analyzing  MIB  data,  which  demonstrate 
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two  characteristics:  “(1)  it  concentrates  most  processing  in  to  the  manager’s  host 
computer,  and  (2)  it  entails  a  high  degree  of  communications  involving  the  manager’s 
host.”  [27] 

Concentrating  the  bulk  of  processing  on  a  central  management  system  is 
processor  intensive  and  introduces  implicit  limitations  that  are  detrimental  to  the  network 
management  control.  The  NMS  conducts  all  of  the  data  computation  and  presentation 
that  requires  high  data  access  and  processing  rates  that  do  not  scale  up  for  large  and 
complex  networks.  “There  is  a  limit  on  the  maximum  number  of  variables  that  can  be 
polled  and  the  frequency  of  polling.”  [27]  Depending  on  the  processing  power  available, 
the  NMS  is  limited  by  the  processor  capabilities.  This  can  limit  the  NMS  on  how  many 
managed  objects  can  be  polled,  and  how  often  they  can  be  polled.  Thus,  the  more 
devices  to  be  managed,  the  greater  the  limitation  imposed.  Finally,  the  centralized  model 
creates  a  single  point  failure  at  the  NMS  that  could  eliminate  communication  and 
management  interaction  with  all  the  virtually  connected  managed  devices  [27]. 
Goldszmidt  presents  an  example  about  automating  the  management  of  routers  that 
clarifies  the  potential  scalability  dilemma: 

Consider  an  organization  that  wishes  to  automate  the  management  of  its 
routers.  That  is,  the  organization  wants  to  deploy  programs  that  (1) 
monitor  the  operations  of  the  routers,  (2)  analyze  their  behaviors  and  (3) 
invoke  appropriate  control  functions.  For  example,  suppose  one  wishes  to 
deploy  programs  that  monitor  routing  tables  to  detect  routing  problems 
and  invoke  appropriate  handlers.  When  the  network  is  large  and  fast, 
remote  polling  of  large  routing  tables  may  consume  significant  bandwidth 
resources.  The  NOC  hosts  may  be  unable  to  detect  and  handle  remote 
problems  sufficiently  fast.  Centralization  thus  seriously  limits  the 
scalability  of  a  network  management  system  [27]. 

An  argument  can  be  made  that  SNMPv2  mitigates  this  problem  by 
enabling  shared  management  processing.  However,  even  with  SNMPv2  the  limitations 
persist,  but  with  a  lesser  impact.  The  SNMPv2  intermediary  managers  serve  as  single 
point  of  failure  for  the  devices  that  they  manage,  and  they  can  be  overwhelmed 
depending  on  the  number  of  devices  managed  and  the  magnitude  of  interaction  with  the 
managed  agents.  Bandwidth  is  still  wasted  with  the  static  constant  polling  of  the 
managed  agents  that  in  many  cases  is  not  necessary.  Using  distributed  management 
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servers  can  be  costly  for  large  networks  that  have  hundreds  of  devices  to  be  managed. 
The  more  devices  there  are  to  be  managed,  the  greater  the  cost  for  additional  intermediate 
managers  to  control  them. 

Scalability  in  a  large  enterprise  network  is  certainly  a  paramount  concern 
when  considering  the  rates  at  which  operators  can  handle  the  volume  of  data  and  alerts. 
Network  administrators  can  easily  become  overwhelmed  by  a  flood  of  messages  that  may 
or  may  not  be  a  cause  for  great  concern.  This  is  more  of  a  problem  as  the  network  grows. 
Ideally,  it  is  more  efficient  for  programmed  software  to  interpret  and  distill  the 
information,  and  react  autonomously  to  alleviate  the  human  operator.  However,  the 
limited  capabilities  of  today’s  centralized  protocols  establish  significant  barriers  [27]. 
b.  Reliability  Issues 

It  is  somewhat  of  a  paradox  to  employ  a  centralized  model  to  manage  and 
control  a  network  that  experiences  network  congestion,  delays,  and  failures  that  renders 
network  management  and  control  by  the  NMS  helpless.  A  centralized  model  is 
unreliable.  It  is  at  the  mercy  of  the  network.  Managed  devices  cannot  accomplish 
recovery  without  instructions  from  the  NMS.  This  is  due  to  the  agent’s  lack  of 
intelligence,  although  RMON,  which  is  incompatible  with  SNMP,  provides  some  degree 
of  mitigation.  RMON  issues  are  discussed  later. 

There  is  a  greater  potential  for  reliability  issues  in  complex,  large-scale 
networks.  “During  times  of  failure,  centralized  management  tends  to  increase  the  rate  of 
data  access  at  a  time  when  the  network  is  least  capable  of  handling  them.”  [27]  “The 
larger  number  of  nodes  requires  increased  polling  over  the  network...  this  becomes  even 
more  of  a  problem  during  high  congestion  periods  when  there  is  a  need  for  management 
actions.”  [11]  This  is  yet  another  paradox  in  that  during  times  of  failure,  when 
management  action  is  vital,  the  NMS  exacerbates  the  problem  by  attempting  to  increase 
the  interaction  with  the  managed  devices,  potentially  causing  a  communications 
“bottleneck.” 

2.  Other  Shortcomings 

a.  SNMP  Deficiencies 

Goldszmidt  provides  an  articulate  summary  of  the  more  technical 
deficiencies  of  SNMP  that  are  worth  presenting  here  [27]: 
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SNMP  polling  introduces  significant  delays  in  retrieving  management  data 
to  the  platfonn.  These  delays  are  due  to:  (1)  transient  conditions,  e.g., 
network  contention  or  congestion,  (2)  configuration  problems,  e.g.,  the 
routing  distance  between  the  devices  and  the  platfonn,  and  (3)  the  protocol 
design,  e.g.,  the  need  for  ASN.l  parsing  of  management  PDUs  in  both 
communication  endpoints  (device  and  platform).  High  frequency  polling 
introduces  large  bandwidth  overhead.  Slow  polling  will  miss  transient 
spikes  (errors,  load,  etc)  as  it  will  average  it  over  long  periods  of  time. 

SNMP-agent  implementations  introduce  big  timing  errors  in  the 
observations  of  real  devices,  which  produce  outdated,  and  potentially 
erroneous,  data  in  the  agent's  MIBs.  Typically,  MIB  tables  change  while  a 
management  application  is  retrieving  or  examining  them.  Inaccuracies  like 
these  often  lead  to  erroneous  computations. 

The  following  list  outlines  several  of  the  problems  associated  with  SNMP 
implementations: 

•  SNMP  uses  the  network  to  transmit  information  about  network 
measurements.  Thus,  it  introduces  an  intrinsic  disturbance. 

•  When  a  device  is  loaded,  its  SNMP-agent  is  scheduled  with 
relatively  lower  priority,  and  thus  queries  to  it  will  often  be 
delayed. 

•  Event  report  traps  are  unacknowledged  and  an  unreliable  protocol 
(UDP)  is  used  to  deliver  them.  Thus,  an  agent  cannot  be  sure  that  a 
trap  has  reached  its  destination; 

•  The  MIB  model  does  not  support  queries  based  on  object  values  or 
types.  Thus,  applications  can  not  fdter  MIB  data  at  its  source,  and 
must  retrieve  large  amounts  of  MIB  data. 

•  Many  implementations  of  SNMP-agents  are  erroneous  and  return 
wrong  data. 

b.  Heterogeneity  and  Convergence  Difficulties 
The  management  of  heterogeneous  networks  requires  the  capabilities  to 
account  for  events  occurring  on  different  time  scales  and  the  capability  to  aggregate 
different  types  of  data.  In  their  study,  Gurer,  Lakshminarayan,  and  Sastry  [10]  found  that 
there  is  no  apparent  single  NM  technology  available  that  has  the  capability  to  fulfill  the 
real-time  quality  of  service  (QoS)  needs  of  the  various  applications  and  network 
technologies.  Voice,  video,  and  data  each  have  different  timing  requirements  that  must 
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be  monitored,  managed  and  controlled  by  the  NM  system.  This  entails  the  simultaneous 
collection  of  different  types  of  data  based  on  the  technology,  analysis,  and  the  appropriate 
decision  being  made  for  that  technology.  Standards  such  as  SNMP  and  CMIP  are  not 
sophisticated  enough  to  conduct  this  type  of  collection,  analysis,  and  proactive  decision¬ 
making.  These  standards  only  provide  simple  data  gathering  and  reporting. 

A  Lucent  Technologies,  Inc.,  research  group  [17]  point  out  that  the 
existence  of  multiple  standards  based  on  the  different  types  of  networks  and  technologies 
(i.e.  voice,  video,  and  data  standards)  is  another  NM  impediment.  These  various  types  of 
networks  and  technologies  have  different  specifications  and  management  requirements 
that  caused  the  creation  of  competing  standards  such  as  SNMP,  CMIP  and 
telecommunications  management  network  (TMN).  The  competing  standards  have  lead  to 
different  communities  adopting  different  standards.  The  data  community  has  generally 
adopted  the  SNMP  standard,  while  the  telecommunications  community  mostly  adopted 
CMIP  and  in  some  cases  the  TMN  protocol  standard. 

c.  Remote  Monitoring  Shortcomings 

In  a  research  study  by  Gavalas,  Ghanbari,  and  O’Mahony  [13],  they  state 
that  the  RMON  distributed  management  model  also  presents  some  considerable  NM 
limitations  and  issues.  Although  RMON  reduces  the  amount  of  bandwidth  traffic  and 
processing  burden  on  the  NMS,  it  is  still  based  on  a  client-server  centralized  architecture. 
In  other  words,  the  real  intelligence  and  processing  still  resides  in  the  NMS.  Since  a 
single  RMON  device  is  required  to  monitor  the  traffic  of  a  single  network  segment,  it 
becomes  very  costly  as  the  number  of  segments  increase.  Due  to  the  fact  that  the  RMON 
probe  can  only  be  set  or  modified  during  configuration  makes  it  very  inflexible  for 
dynamic  changes  during  runtime.  Finally,  RMON  is  limited  to  providing  only  traffic- 
oriented  statistics  as  opposed  to  node-oriented  statistics. 

d.  Market  Findings 

Trying  to  find  a  comprehensive  study  on  best  enterprise  network 
management  products  on  the  market  is  hard  to  come  by.  The  reason  for  this  is  perhaps 
there  is  no  true  enterprise  network  management  solution  that  incorporates  the  full 
spectrum  of  NM  functionality.  This  is  consistent  with  what  the  many  researchers  have 
found  regarding  the  NM  shortcomings.  This  is  essentially  the  reason  why  the  military 


23 


cannot  simply  buy  a  commercial  NM  product  off-the-shelf  to  fulfill  their  enterprise 
network  management  requirements.  Consider  the  following  comments  made  in  a  recent 
article  addressing  the  limitations  of  SNMP  to  comprehensively  achieve  FCAPS  [35]: 

...These  limitations  partly  revolve  around  what  can— or  can’t— be  done 
with  SNMP.  As  a  standard,  SNMP  is  widely  installed  and  leveraged  by 
network-management  vendors  but  is  primarily  limited  to  addressing  the 
fault  and  perfonnance  portions  of  FCAPS.  Frankly,  even  expensive 
network-management  products  are  hamstrung  when  they  rely  solely  on 
SNMP  data,  which  is  why  many  have  proprietary  agents. 

Iosif  G.  Ghetie  [12]  conducted  a  market  study  on  major  marketed  NM 
products  that  revealed  a  lack  of  cooperation  and  integration  between  NM  applications. 
Here  are  some  of  the  noteworthy  findings  from  the  study  that  still  prevails  today: 

•  None  of  the  products  are  able  to  fully  cover  all  the  network  management 
areas.  Their  main  focus  is  on  network  monitoring  and  event  reporting. 

•  None  of  the  products  are  able  to  easily  manage  heterogeneous  networks. 

•  Scaling  is  difficult  due  to  the  inadequacy  of  the  application  development 
tools. 

•  The  systems  are  very  resource  consuming  (e.g.  SNMP  polling). 

•  All  are  expensive. 

3.  Final  Point 

The  problems  and  issues  discussed  above  regarding  the  prevalent  network 
management  models  will  impose  limiting  factors  on  the  GIG  and  the  Army’s  enterprise 
management  initiative.  This  chapter  set  out  to  define  the  current  NM  standards  of  today 
and  clearly  exposed  the  inherent  weaknesses  that  will  limit  the  military’s  efforts. 
Without  more  flexible,  adaptable,  and  scalable  protocol  standards  the  GIG  vision 
probably  will  not  be  truly  realized  to  the  fullest  extent.  The  shortcomings  of  the  modern 
day  protocol  standards  can  result  in  the  failure  of  reaching  the  full  spectrum  dominance 
objective,  which  translates  to  handicapping  the  warfighters  and  possibly  compromising 
battlefield  successes.  Emerging  intelligent-agent-based  technologies  and  multi-agent 
systems  offers  robust  capabilities  that  seemingly  provide  the  necessary  characteristics 
required  to  manage  enterprise  networks  as  vast  as  the  proposed  GIG  and  Army  AEI. 


24 


III.  INTELLIGENT  AGENTS  AND  MULTI-AGENT  SYSTEMS 


A.  THE  AGENT  WORLD 

The  complexities  of  network  management  are  growing  beyond  the  capabilities  of 
the  current  centralized  and  distributed  NM  systems.  These  classical  approaches  lack  the 
adaptability,  flexibility,  reliability,  and  scalability  that  are  necessary  to  manage  the  vast 
inherent  in  today’s  and  future  enterprise  infrastructures.  Heterogeneous  enterprise 
network  environments  are  laced  with  legacy  systems,  proprietary  solutions,  and  disparate 
open  standard  NM  protocols.  Even  modem  day  technologies  are  lacking.  “Emerging 
technologies,  like  CORBA  for  example,  do  not  seem  to  be  able  to  solve  problems  of 
complexity,  cost  and  scalability”  [14]. 

Intelligent  Agent  (IA)  technologies  are  on  the  horizon  and  appear  to  a  most 
promising  solution  to  resolve  many  of  the  enterprise  NM  pitfalls.  This  section  gives  and 
overview  of  intelligent-agent-based  technologies  and  Multi-agent  Systems  (MAS). 

1.  What  is  an  Agent? 

There  exist  unsettling  debates  as  to  the  definition  of  what  an  agent  really  is  in  the 
agent  communities.  There  are  different  schools  of  thought  from  the  various  disciplines 
such  as  the  artificial  intelligence  (AI)  or  computer  science  communities.  Certain 
attributes  may  be  of  more  importance  in  one  discipline  than  in  another.  For  example,  in 
some  disciplines,  the  ability  for  agents  to  learn  from  their  experiences  is  of  paramount 
importance,  while  in  others  it  is  not.  However,  most  disciplines  commonly  agree  that 
autonomy  is  central  to  the  notion  of  agency  [16]. 

Fundamentally,  an  agent,  in  the  context  of  software,  is  essentially  a  self-contained 
software  program  module  that  is  programmed  to  carry  out  certain  actions  on  behalf  of  a 
human  user  or  other  software  entity  in  a  certain  software  environment.  The  agent  can 
perform  such  things  as  searching  for  information,  negotiating  services,  executing 
specified  tasks,  or  collaborating  with  other  agents.  These  actions  are  conducted  in  an 
autonomous  fashion  that  requires  little  or  no  human  intervention. 

A  highly  recognized  definition  of  an  agent  comes  from  Micheal  Wooldridge  and 
N.  R.  Jenning  [15]:  An  agent  is  a  computer  system  that  is  situated  in  some  environment, 
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and  that  is  capable  of  autonomous  action  in  this  environment  in  order  to  meet  its  design 
objectives.  Autonomous  in  this  sense  means  that  agents  are  able  to  act  without  the 
intervention  of  humans  or  other  systems:  they  have  control  both  over  their  own  internal 
state,  and  over  their  behavior.  “Another  common  view  of  an  agent  is  that  of  an  active 
object  or  bounded  process  with  the  ability  to  perceive,  reason,  and  act.”  [18]  Keep  in 
mind  that  the  definitions  above  refer  to  the  general  concept  of  “agents”  and  not 
necessarily  “intelligent  agents,”  which  is  defined  later.  An  unintelligent  agent  is 
distinguished  from  an  intelligent  agent  based  on  the  agent’s  properties. 

Looking  back  at  SNMP,  recall  that  the  managed  device  contains  a  software  entity 
called  an  agent.  In  contrast,  the  SNMP  defined  agent  does  not  qualify  as  an  agent  as 
expressed  in  the  context  of  the  preceding  paragraphs.  The  SNMP  agent  has  absolutely  no 
degree  of  autonomy  or  intelligence.  The  SNMP  framework  was  written  with  the 
assumption  that  network  devices  have  limited  computing  resources  available,  and 
therefore  must  rely  on  the  NMS  for  instructions  and  computational  processing.  Even 
considering  the  SNMP  Trap,  the  agent  is  limited  to  a  fixed  set  of  predefined  thresholds 
that  are  statically  implemented.  This  exemplifies  the  wide  abuse  of  the  term  “agent.” 

2.  Agent  Properties 

Considering  the  different  perspectives  on  the  definition  of  an  agent,  it  is  useful  to 
look  at  the  varying  properties  (capabilities)  that  an  agent  can  assume.  Cheikhrouhou,  et 
al.  [14],  provide  a  list  of  several  properties  that  commonly  characterize  agents: 

•  Autonomy.  Self-government,  independence:  Branch  managers  have  full 
autonomy  in  their  own  areas  (Oxford  Advanced  Learner’s  Dictionary). 
The  agent  decides  himself  when  and  under  which  condition  he  will 
perform  what  actions.  An  autonomous  agent  is  a  system  situated  within 
and  as  a  part  of  an  environment  that  senses  that  environment  and  acts  on 
it,  over  time,  in  pursuit  of  its  own  agenda  so  as  to  effect  what  it  senses  in 
the  future. 

•  Communication.  One  of  the  key  properties  of  agents  is  the  ability  to 
speak  with  a  peer,  with  a  human  (Interface  Agents),  or  with  a  device.  The 
following  communications  between  agents,  called  languages,  are  often 
used: 

o  Blackboard:  Agents  read  and  write  messages  in  a  shared  location, 
called  a  blackboard. 
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KQML:  Knowledge  Query  and  Manipulation  Language  is  a 
language  and  protocol  for  exchanging  infonnation  and 
knowledge  using  what  is  known  as  “performatives” 
(discussed  later). 

o  KIF:  Knowledge  Interchange  Format. 

o  COOL:  structured  conversation,  KQML-based,  which  is 
used  for  the  coordination  of  agents. 

Collaboration/Cooperation.  Agents  are  collaborative  when  they  are  able  to 
work  together.  The  agent  is  able  to  communicate  and  negotiate  with 
others;  it  is  deliberative  and  may  coordinate  its  actions  with  others. 
Collaborative  agents  are  particularly  useful  when  a  task  involves  several 
systems  on  the  network.  Negotiation  is  the  main  issue  for  collaborative 
agents.  While  coordination  can  occur  without  collaboration,  collaboration 
needs  negotiation. 

Deliberation.  Know  rules,  and  apply  them  without  waiting  for 
instructions.  Wooldrige  and  Jennings  define  a  deliberative  agent  as  “one 
that  contains  an  explicitly  represented,  symbolic  model  of  the  world,  and 
in  which  decisions  (...)  are  made  via  logical  (or  at  least  pseudo-logical) 
reasoning,  based  on  pattern  matching  and  symbolic  manipulation.” 

Mobility.  Since  the  arrival  of  Java  portability,  a  number  of  mobile  agent 
models  have  surfaced.  But,  there  are  different  kinds  of  mobility  defined: 

o  The  mobility  that  allows  the  agent  to  move  from  one  system  to  a 
similar  one. 

o  The  mobility  that  allows  the  agent  to  move  to  another  different 
system. 

o  The  mobility  that  allows  agents  to  suspend  their  action  on  one 
system,  move  to  another  and  go  on. 

o  The  mobility  that  allows  the  agent  to  move  itself,  rather  than  being 
transported. 

o  The  mobility  that  is  a  duplication  of  the  agent  to  another  system 
(cloning). 

o  The  mobility  that  allows  agents  to  carry  its  knowledge  to  another 
system. 

Generally,  mobility  turns  out  to  be  a  mixture  of  these  definitions.  One  of 
the  main  issues  surrounding  mobility  is  the  potential  security  weakness  of 
mobile  agents. 

Learning.  Learning  is  the  ability  of  an  agent  to  acquire  knowledge  and 
use  it  to  modify  its  behavior.  Despite  the  fact  that  learning  is  an  important 
factor  of  intelligence,  few  agents  are  able  to  learn.  Most  often  they  have 


fixed  (pre-compiled)  rules  and  knowledge  bases.  The  objective  of  learning 
is  for  the  agent  to  perforin  new  tasks  dynamically  without  being  stopped. 
Different  ways  of  learning  are  studied  and  experimented: 

o  Generalization:  you  observe  your  environment  and  deduce  rules. 

o  Instruction:  you  obtain  knowledge  and  rules  from  others  (transfer). 

•  Pro-activeness.  Pro-active  actions  are  intended  to  cause  changes,  rather 
than  just  reacting  to  change.  Pro-active  agents  generally  follow  plans,  or  at 
least  execute  rules  when  the  environment  reaches  a  known  threshold. 
Sometimes  pro-active  is  used  with  the  same  meaning  as  deliberative,  but 
an  agent  may  be  pro-active,  because  it  has  been  requested  to  perform  pro¬ 
active  tasks,  as  opposed  to  deliberative  agents,  who  decide  themselves  to 
be  pro-active. 

•  Reactivity.  Do  something  when  an  event  occurs. 

•  Security.  Be  able  to  discriminate  friends  from  enemies  and  contaminated 
elements. 

•  Planning.  The  agent  organizes  by  priorities  the  actions  to  perform  during 
its  life.  For  many  researchers  planning  is  one  of  the  most  important 
properties  for  an  intelligent  agent  to  possess.  Planning  is  used  by 
deliberative  and  pro-active  agents  according  to  their  knowledge  of  the 
environment  and  the  possible  actions  that  they  can  apply  to  it. 

•  Delegation.  An  agent  may  ask  another  agent  to  perform  one  of  his  goals 
or  tasks.  This  capacity  is  very  important  for  balancing  resources. 


It  is  important  to  note  that  the  descriptions  above  are  not  all  inclusive.  The 
various  disciplines  would  defend  their  position  as  to  what  descriptions  constitute  their 
ideas  of  an  agent.  But,  as  can  be  seen,  the  properties  of  agents  are  quite  extensive  and 
sophisticated.  Appling  intelligent  agents  to  the  NM  arena  are  a  matter  of  incorporating 
the  necessary  properties  to  deal  with  the  needs  for  the  NM  complexities  involved  in 
future  enterprise  networks. 

3.  What  is  an  Intelligent  Agent? 

Just  as  with  the  debate  over  the  general  agent  definition,  there  is  no  universally 
agreed  upon  definition  of  an  intelligent  agent.  Considering  the  previously  defined 
definition  of  an  agent  and  the  various  properties,  an  intelligent  agent  assumes  all  the 
characteristics  of  the  agent  definition,  but,  also  assumes  a  mixture  of  the  properties  that 
are  determined  based  on  the  IA  design  or  on  one  of  the  several  beliefs  of  a  particular 
community.  For  example,  Wooldridge  [16]  defines  an  IA  as  one  that  is  capable  of 
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flexible  autonomous  action  in  order  to  meet  its  design  objective,  where  flexibility  means 
three  things: 

•  reactivity:  intelligent  agents  are  able  to  perceive  their 
environment,  and  respond  in  a  timely  fashion  to  changes  that  occur 
in  it  in  order  to  satisfy  their  design  objectives; 

•  pro-activeness:  intelligent  agents  are  able  to  exhibit  goal-directed 
behavior  by  taking  the  initiative  in  order  to  satisfy  their  design 
objectives; 

•  social  ability:  intelligent  agents  are  capable  of  interacting  with 
other  agents  (and  possibly  humans)  in  order  to  satisfy  their  design 
objectives. 

Different  kinds  of  intelligent  agents  have  different  subsets  of  properties.  Thus, 
the  various  disciplines  have  different  views  of  an  intelligent  agent.  This  pretty  much  sums 
up  the  general  IA  context: 

Most  agree  though  that  to  be  intelligent,  agents  must  include  the  ability  to 
operate  in  real-time  and  communicate  using  natural  language.  Along  with 
this,  they  must  be  able  to  learn  from  their  environment  and  be  capable  of 
adaptive  goal-oriented  behavior.  In  other  words,  intelligent  agents  need  to 
work  together  on  a  user-specified  problem  when  told  to  do  so  and  must  be 
able  to  do  this  successfully  in  a  dynamic  environment.  Importantly,  the 
agent  must  communicate  to  the  user,  in  a  language  he  or  she  understands, 
that  the  task  has  been  successfully  completed  or  that  it  has  been  otherwise 
tenninated. 

4.  Multi- Agent  Systems 

The  real  benefit  of  agent  technologies  is  leveraging  the  capabilities  of  multiple 
agents  that  have  different  functions  and  have  the  ability  interact  with  other  agents  (or 
humans)  to  solve  problems  and  execute  tasks.  A  multi-agent  system  (MAS)  is  a  subset  of 
Distributed  Artificial  Intelligence  (DIA).  DIA  is  concerned  with  problem-solving  where 
agents  solve  tasks  in  a  collaborative  manner,  in  a  distributed  environment. 

A  MAS  is  a  platform  composed  of  multiple  agents  that  interact  to  solve  problems 
beyond  their  individual  capabilities  or  knowledge.  Interaction  [18]  is  everything  that 
occurs  between  agents  (agent-agent  interaction)  and  their  environment  (agent- 
environment  interaction).  Agents  can  interact  directly  via  verbal  communication  (e.g.  by 
providing  information  in  which  other  agents  are  interested  or  which  confuses  other 
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agents)  or  indirectly  via  their  environment  (e.g.  by  observing  one  another  or  by  carrying 
out  an  action  that  modifies  the  environmental  state). 

The  idea  of  MASs  is  to  distribute  functionality  and  intelligence  in  a  decentralized 
manner  where  agents  can  dynamically  solve  exclusive  problems  only  when  necessary; 
providing  flexibility,  adaptability,  and  efficiency.  The  strength  of  MASs  is  agents 
collaborating  and  cooperating  to  collectively  solve  large,  complex  problems  that  are 
beyond  the  capabilities  of  any  single  agent.  This  point  counters  the  architectural  nature 
of  the  traditional  NM  protocols  where  the  agent-NMS  interaction  is  inflexible  and  rigidly 
defined. 


An  MAS  has  the  following  advantages  over  a  single  agent  or  centralized  approach 


[46]: 


•  An  MAS  distributes  computational  resources  and  capabilities  across  a 
network  of  interconnected  agents.  Whereas  a  centralized  system  may  be 
plagued  by  resource  limitations,  performance  bottlenecks,  or  critical 
failures,  an  MAS  is  decentralized  and  thus  does  not  suffer  from  the  "single 
point  of  failure"  problem  associated  with  centralized  systems. 

•  An  MAS  allows  for  the  interconnection  and  interoperation  of  multiple 
existing  legacy  systems.  By  building  an  agent  wrapper  around  such 
systems,  they  can  be  incorporated  into  an  agent  society. 

•  An  MAS  models  problems  in  tenns  of  autonomous  interacting 
component-agents,  which  is  proving  to  be  a  more  natural  way  of 
representing  task  allocation,  team  planning,  user  preferences,  open 
environments,  and  so  on. 

•  An  MAS  efficiently  retrieves,  filters,  and  globally  coordinates  information 
from  sources  that  are  spatially  distributed. 

•  An  MAS  provides  solutions  in  situations  where  expertise  is  spatially  and 
temporally  distributed. 

•  An  MAS  enhances  overall  system  perfonnance,  specifically  along  the 
dimensions  of  computational  efficiency,  reliability,  extensibility, 
robustness,  maintainability,  responsiveness,  flexibility,  and  reuse. 


Wooldridge  [16]  discusses  two  contrasting  patterns  of  coordination  when  agents 
interact  -  cooperation  and  competition.  In  cooperation  several  agents  work  together  by 
drawing  on  their  knowledge  and  capabilities  to  achieve  a  goal.  These  agents  fail  or 

succeed  together  because  they  try  to  accomplish  collectively  what  individual  agents 
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cannot.  With  competition,  agents’  goals  are  conflicting  so  they  work  against  each  other. 
Competitive  agents  try  to  maximize  their  own  benefit  at  the  expense  of  other  agents,  thus, 
the  success  of  one  implies  the  failure  of  others. 

5.  Agent  Architecture 

The  forgoing  discussion  of  agents  just  merely  provides  a  cursory  overview  of  the 
principles  of  agents.  However,  the  theory  and  architectural  makeup  of  agents  is  much 
more  complex  and  ambiguous.  This  difficult  subject  matter  is  beyond  the  scope  of  this 
study,  however,  it  is  important  to  understand  that  there  is  much  more  involved  in  the 
architecture  of  an  agent.  There  are  several  approaches  for  designing  and  developing 
agents.  An  agent  architecture  [18]  is: 

a  particular  methodology  for  building  agents.  More  generally,  the  term  is 
used  to  denote  a  particular  arrangement  of  data  structures,  algorithms,  and 
control  flows,  which  an  agent  uses  in  order  to  decide  what  to  do.  Agent 
architectures  can  be  characterized  by  the  nature  of  their  decision  making. 
Example  types  of  agent  architectures  include  logical-based  architectures 
(in  which  decision  making  is  achieved  via  logical  deduction),  reactive 
architectures  (in  which  decision  making  is  achieved  via  simple  mapping 
from  perception  to  action),  belief-desire-intention  architectures  (in  which 
decisions  making  is  viewed  as  practical  reasoning  of  the  type  that  we 
perform  every  day  in  furtherance  of  our  goals),  and  layered  architectures 
(in  which  decision  making  is  realized  via  the  interaction  of  a  number  of 
task  accomplishing  layers). 

6.  Agent  Communication 

Once  agents  are  created,  they  need  a  mechanism  for  communicating  between 
agents,  application,  and  human  users.  Agent  communication  [17]  is  accomplished  with 
three  components:  ontology,  content  language,  and  agent  communication  language 
(ACL).  Agents  use  ontologies  to  limit  the  scope  of  their  interactions  and  focus  on  a 
specific  world  of  understanding.  The  content  language  is  used  for  information  encoding 
through  statements  about  the  domain,  which  combine  terms  from  the  corresponding 
ontology  into  meaningful  sentences.  The  ACL  acts  as  a  fonnalism  for  exchanging 
messages. 

An  example  of  one  of  the  most  often  used  ACLs  for  communication  exchange  is 
the  Knowledge  Query  and  Manipulation  Language  (KQML)  [14]  (the  Loundation  for 
Intelligent  Physical  Agents  (LIPA)  is  another).  It  was  developed  under  the  Defense 
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Advanced  Research  Projects  Agency  (DARPA)  Knowledge  Sharing  Initiative.  KQML  is 
a  protocol  for  exchanging  information  and  knowledge. 

KQML  is  based  on  speech  act  theory  [14],  which  is  founded  on  the  idea  that  with 
language  you  not  only  make  statements,  but  also  perform  actions,  such  as  requests, 
suggestions,  commitments,  and  replies.  For  example,  when  you  request  something  you 
do  not  just  report  on  a  request,  but  you  actually  effect  the  request.  These  performance 
actions,  or  requests,  are  called  performatives.  Example  performative  verbs  include 
promise,  report,  convince,  insist,  tell,  request,  and  demand. 

There  are  three  main  aspects  of  speech  act  [19]: 

The  locution  refers  to  the  lowest  level  of  the  speech  act,  namely,  the  string 
that  is  transmitted.  The  illocution  refers  to  the  intrinsic  meaning  of  the 
speech  act.  The  perlocution  refers  to  the  possible  effects  of  the  speech  act 
on  the  recipients.  The  locution  can  be  varied  and  the  perlocutions  depend 
on  the  receipent.  However,  the  illocution  tells  us  the  meaning  that  is 
conveyed. 

KQML  divides  communication  into  illocutionary  categories  [14]:  assertives 
(statements  of  fact),  directives  (commands  or  requests),  declaratives  (announcements  of 
actions  taken),  commisives  (commitments)  and  expressives  (expressions  of  emotion). 
For  example,  KQML  uses  performatives  such  as  tell,  which  asserts  a  belief;  deny,  which 
asserts  a  disbelief;  ask-if,  which  requests  information;  error,  which  asserts  a  message  was 
read  incorrectly;  and  sorry,  which  asserts  that  a  reply  or  task  cannot  be  undertaken. 

The  fundamental  objective  of  agent  interaction  is  to  separate  the  semantics  of 
protocol  communication  protocol  from  the  semantics  of  the  enclosed  message.  While  the 
semantics  of  the  communication  protocol  must  be  domain  independent,  the  semantics  of 
the  enclosed  message  may  depend  on  the  domain.  The  communication  protocol  must  be 
universally  shared  by  all  agents.  With  KQML,  all  the  infonnation  for  understanding  the 
content  of  the  message  is  encapsulated  in  the  communication  itself.  The  KQML  protocol 
has  this  basic  structure  [19]: 

(KQML-perfomative 

:sender  <word> 

receiver  <word> 

:  language  <word> 
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:ontology  <word> 

xontent  <expression> 

...) 

The  first  line  of  this  format  indicates  the  KQML  performative  (e.g.  “tell”).  As 
discussed  above,  since  the  KQML  is  based  on  speech  act  performatives,  the  semantics  of 
KQML  perfonnatives  is  domain  independent.  Think  of  the  KQML  perfonnative  as  a 
header  that  wraps  (encapsulates)  the  message  information  for  exchange.  The  wrapped 
message  may  be  domain  dependent  in  order  for  the  recipient  to  understand  it.  The 
:sender  and  receiver  fields  identify  the  sender  and  the  receiver  of  the  message.  The 
: language  field  identifies  the  language  in  which  the  message  is  expressed.  The  : ontology 
field  denotes  the  ontology  that  contains  the  vocabulary  necessary  for  collaboration  and 
comprehension.  Finally,  the  content  field  is  the  message  or  instruction  itself.  Note  that 
there  are  other  fields,  such  as  :in-reply-to,  that  can  also  be  embedded  in  the  structure  [19]. 
Here  is  an  example  [10]: 

(ask-if 

:sender  ProblemDetectionAgent 

receiver  Control_Agent 

danguage  C++ 

:ontology  ActionRequest 

:in-reply-to  NA 

xontent  “Initiate(Traceroute  (Node_36))”  ) 

In  this  example,  the  Problem  Detection  Agent  is  requesting  that  the  Control 
Agent  send  a  traceroute  message  to  Node_36  and  return  the  message  to  the  Problem 
Control  Agent. 

In  KQML,  agents  communicate  either  synchronously  or  asynchronously.  The 
difference  is  that  for  synchronous  communication  the  sending  agent  waits  for  a  reply, 
whereas  with  asynchronous  communication  it  continues  with  its  reasoning  or  acting  until 
it  receives  a  reply.  There  are  many  other  dynamics  that  are  within  the  KQML  protocol 
that  enable  agent  communication  [19]. 
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B.  AN  ARGUMENT  FOR  INTELLIGENT  AGENTS  IN  NETWORK 

MANAGEMENT 

A  group  of  researchers  at  Lucent  Technologies,  Inc.  [17]  created  a  agent-based 
platform  call  LucINA  (Lucent  Intelligent  Agent  Network)  to  test  and  evaluate  intelligent- 
agent-based  NM  and  other  agent  related  technologies.  Based  on  their  research  and 
experimentation  they  have  made  a  compelling  argument  for  intelligent-agent-based 
technologies  as  a  premier  solution  for  NM  of  large  heterogeneous  enterprise  networks. 
The  following  paragraphs  synopsize  their  findings. 

1.  Dynamism 

The  growing  incompatibility  of  multi-vendor  equipment  and  dynamic  changes  in 
network  topologies  has  caused  increased  complexity  and  dramatic  structural  changes  in 
network  architectures.  The  Lucent  group  found  that  IA  systems  are  better  suited  for 
managing  such  dynamically  changing  environments.  They  concluded  that  agent-based 
systems  handle  dynamism  in  a  natural  way  because  these  types  of  platforms  provide  for 
controlled  agent  life  cycle.  Agents  can  be  added  to  the  system  at  will,  via  registration 
mechanisms.  Agents  can  advertise  itself  and  its  services,  making  network  discovery 
automatic.  The  meta-level  facilities  are  used  to  dynamically  discover  supported 
communication  parameters  such  as  language,  protocol,  and  ontology. 

Traditional  and  other  management  frameworks  are  inflexible  and  limited 
compared  to  agent-based  systems.  Frameworks  based  on  SNMP  and  CMIP  require 
software  recompilation  to  handle  changes,  and  offline  time  is  required  to  activate  new  or 
modified  software.  CORBA-based  systems  implemented  with  solely  static  invocations 
suffer  from  the  same  problems.  The  use  of  dynamic  invocation  interface  (DII)  with 
COBRA  presents  a  time-consuming  nuance  relative  to  agent-based  systems  -  it  is 
cumbersome  for  programmers  to  code.  Web-based  solutions  are  completely  reactive, 
which  means  that  if  the  NMS  doesn’t  ask  for  data  then  it  will  not  be  delivered. 

2.  Multiple  Standards 

The  diverse  user  requirements  for  the  various  network  technologies  have  led  to 
the  creation  of  several  different  network  management  standards  and  proprietary  products 
(this  is  discussed  in  Chapter  II).  At  the  highest  level,  agents  use  a  uniform 
communication  means  (e.g.  KQML)  for  interaction  between  heterogeneous  agents.  Any 
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other  standard  can  be  applied  at  the  content  level.  No  other  network  management 
technology  has  this  degree  of  flexibility. 

3.  Interoperability 

Network  management  interoperability  issues  have  resulted  from  the  development 
of  dissimilar  network  management  information  models  and  manager-agent 
communication  protocols.  In  agent-based  systems  the  meta-layer  for  exchanging 
communications  acts  allows  for  coherent  exchange  of  data  at  the  content  level.  Agents 
know  exactly  what  kind  of  data  to  expect  and  what  the  meaning  is.  This  allows  for  agent- 
to-agent  understanding  in  which  data  are  processed  automatically  without  prior 
arrangements.  Ontologies  serve  to  mitigate  interoperability  issues  by  structuring  data, 
describing  relationships  and  rules  governing  the  data,  as  well  as  processing  algorithms. 
The  Lucent  research  group  asserts  that  ontologies  are  much  easier  to  standardize  than 
traditional  standards  because  they  are  relatively  smaller  in  size. 

Interoperability  among  the  other  NM  technologies  still  suffer  from  unresolved 
issues.  The  traditional  NM  standards  still  have  many  incompatibility  issues  that  can  only 
be  resolved  by  modifying  code.  CORBA  does  not  have  a  negotiation  mechanism  that 
allows  for  completely  ad  hoc  use  of  arbitrary  added  object.  In  the  case  of  Web-based 
solutions,  there  is  no  way  of  implementing  XML  standardization  without  implementing  a 
Web  server  and  a  Web  browser  as  de  facto  negotiating  agents. 

4.  Distribution 

Current  solutions  for  distributed  management,  such  as  RMON,  are  perhaps  not 
truly  distributed  due  to  their  inherent  centralized  nature.  CORBA,  as  a  distributed  NM 
solution,  suffers  from  many  problems  such  as  information  bottlenecks  and  single  points 
of  failure.  The  advantage  of  an  agent-based  system  is  that  it  is  inherently  distributed. 
The  communication  language  provides  for  natural  collaboration  between  agents  at 
various  levels.  This  allows  for  a  bottom-up  approach  to  problem  solving  and  utilization 
of  available  services  at  any  level  and  stage.  The  mobility  of  agents  is  also  an  advantage. 
Mobility  and  intelligent  distributed  processing  can  provide  efficiencies  such  as  lower 
bandwidth  requirements  for  network  management. 
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5.  Security 

As  enterprise  networks  continue  to  grow  in  complexity,  the  stakes  for  security 
becomes  increasingly  more  critical.  Because  of  the  inherent  vulnerabilities  of  software 
security,  there  is  no  clear  advantage  that  IA  technologies  have  over  other  technologies. 
The  agent-based  framework  does  allow  for  security  schemes  at  several  layers.  This 
framework  is  convenient  to  enforce  security  because  agents  decide  at  run  time  whether  a 
received  request  will  be  fulfilled.  All  the  other  current  technology  frameworks  provide 
security  mechanisms  because  the  industries  mandate  it  within  the  standards.  SNMP  and 
CMIP  standards  mandate  security  schemes  that  have  to  be  implemented  at  the  design  and 
deployment  phases.  CORBA  provides  security  mechanisms  integrated  with  the  platform. 
Web-based  approaches  depend  on  the  security  implemented  by  Web  browsers  and 
servers. 

6.  User  Experience 

As  networks  continue  to  expand,  more  and  more  users  will  become  involved  in 
some  form  of  network  management.  Web-based  solutions  have  addressed  this  by 
providing  user-friendly  graphical  user  interfaces  (GUIs)  that  simplifies  NM  for  the 
common  user.  The  Lucent  group  suggests  that  agent-based  systems  stand  out  because 
they  deliver  plug-and-play  networks.  Network  elements  do  not  have  to  be  provisioned 
because  the  agents  residing  on  them  can  negotiate  the  conditions  for  incorporating  them 
into  the  network.  Similarly,  services  can  be  plugged  into  the  network  and  made  available 
automatically  due  to  negotiation  and  directory  services. 

7.  Rapid  Software  Delivery 

Today’s  competitive  marketplace  elicits  rapid  design,  deployment,  and 
maintenance  of  software  products,  as  well  as  controlling  the  costs  of  expansions  and 
modifications.  Therefore,  systems  have  to  be  designed,  implemented,  and  deployed 
quickly.  Agent-based  systems  are  very  promising  in  this  regard.  They  are  designed  with 
high-level  concepts  because  the  platfonn  handles  most  low-level  technicalities  such  as 
message  passing  through  method  invocation.  Thus,  systems  can  be  designed  by  network 
management  experts  as  opposed  to  programmers.  In  contrast,  architectures  built  on  top  of 
SNMP  and  CMIP  standards  are  the  most  expensive  to  design,  implement,  deploy,  and 
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maintain.  Modifications  require  plenty  of  changes  to  the  configuration  data  as  well  as 
off-line  time.  To  accommodate  new  advances  in  standardization,  software  components 
have  to  be  recompiled  and  relinked.  Similar  criticism  is  applicable  to  Web-based 
approaches.  CORBA-based  systems  can  take  advantage  of  well-established  object- 
oriented  analytical  and  design  methodologies  and  development  tools.  Systems  that  take 
full  advantage  of  these  capabilities  are  flexible  and  easy  to  upgrade. 

8.  Cost 

Cost  is  a  major  factor  for  any  organization  when  deciding  to  procure  a  software 
system  (especially  when  buying  enterprise  NM  software).  So  much  so  that  cost  cutting 
has  led  to  the  evolution  of  management  tools  from  proprietary  and  semi-proprietary 
solutions  based  on  specialized  platforms  to  standards-based  solutions.  The  Lucent  group 
argues  that  the  efficiencies  (as  discussed  throughout)  that  agent-based  systems  bring  will 
result  in  lower  expenses  for  NM.  The  Lucent  group  underscores  that  the  value  of  the  use 
of  artificial  intelligence  techniques  in  providing  human-like  behavior  will  result  in 
substantial  savings  in  direct  costs  due  to  the  decreased  requirement  for  human  operators. 
This  holds  true  for  the  other  technologies,  but  the  capabilities  of  agent-based  systems 
provide  a  much  greater  value. 

The  lucent  findings  discussed  are  consistent  with  the  shortcomings  of  the  classical 
protocols  presented  in  Chapter  II.  The  evaluation  findings  provide  comparative  proof  and 
demonstrate  the  potential  for  intelligent-agent-based  technologies  as  an  overall  better 
solution  for  enterprise  NM,  relative  to  SNMP  and  the  other  common  protocols. 

C.  MULTI-AGENT  SYSTEM  MODELS 

There  are  quite  a  few  multi-agent  models  that  are  under  development,  active  and 
commercialized.  This  section  provides  the  reader  some  insight  to  the  nature  of  the 
different  MAS  approaches.  The  models  presented  here  are  mostly  derived  from  a 
research  project  [37]  that  conducted  an  indebt  analysis  of  MAS  platforms  that  were  most 
suitable  for  the  network  management  domain.  Since  the  project  publication,  some  of  the 
platforms  have  matured,  some  have  dissolved,  and  some  have  been  commercialized. 
However,  the  overall  objective  here  is  to  provide  a  sense  of  the  various  approaches  that 
can  be  undertaken.  There  are  other  platforms  that  are  not  mentioned  here  for  several 
reasons.  Some  platforms  are  proprietary  and  not  openly  available,  many  are  not  suited 
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for  NM,  and  others  are  just  simply  unknown.  More  interested  readers  are  urged  to 
explore  the  web  and  sites  such  as:  www.multiagent.com, www.agents.umbc.edu. 


The  research  project  considered  several  factors  for  selecting  the  multi-agent 
systems  discussed  below.  The  following  is  the  list  of  criteria  used  [37]: 

•  Communication:  What  protocol  does  the  system  use?  Is  it  flexible?  Does 
the  system  provide  directed  communication  and/or  multicast 
communication? 

•  Programming  Language:  Is  it  a  standard  language?  Easy  to  use? 
Compatible  with  other  components?  Portable? 

•  Flexibility:  Is  it  easy  to  adjust  the  system  to  a  particular  application?  What 
are  the  constraints  and  requirements? 

•  Architecture:  Is  the  system  object  oriented?  Layer  based?  Well  designed? 

•  User  Interface:  Can  the  agents  be  visualized?  How  does  the  user  interact 
with  the  system? 

•  Scalability:  Does  the  system  adapt  itself  to  different  situations?  Can  the 
system  be  widely  extended? 

•  User  friendliness:  Is  it  easy  to  start  using  the  system?  Is  the  learning 
curve  low? 

•  Identification:  How  do  agents  identify  one  another?  Is  it  a  centralized 
way  or  through  communication? 

•  Security:  Are  they  any  security  features  provided?  Are  the 
communications  among  agents  encrypted? 

•  Extra  features:  Are  any  extra-features  available?  Are  the  agents  mobile? 
Are  any  coordination  constructs  available? 

The  next  several  paragraphs  are  summaries  from  the  research  project  [37] 
describing  the  characteristics  of  the  various  MAS  platforms.  It  some  cases,  a  synopsis  of 
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the  research  findings  are  also  presented.  The  MAS  suggested  in  this  study,  CoABS,  will 
be  described  in  greater  detail  in  Chapter  VI. 

1.  JAFMAS 

Java-based  Agent  Framework  for  Multi-Agent  Systems  (JAFMAS)  [38]  is  a  Java- 
based  framework  for  representing  and  developing  cooperation  knowledge  and  protocols 
in  a  multi-agent  system.  The  framework  enables  the  agents  to  work  together  and 
coherently  achieve  their  goals  and  those  of  the  multi-agent  community  as  a  whole. 
JAFMAS  defines  a  generic  methodology  for  multi-agent  application  development  and 
provides  a  set  of  services  that  relieves  the  developer  from  the  effort  of  programming 
cooperation  mechanism  form  scratch.  It  guarantees  that  essential  interoperation, 
communication  and  cooperation  facilities  are  available  to  support  agent  application 
developers.  JAFMAS  is  concerned  with  coordinating  intelligent  behavior  among 
collection  of  intelligent  agents  forming  the  multi-agent  system.  Agents  should  coordinate 
their  knowledge,  plans  and  goals  so  that  they  can  take  actions  which  results  in  a  joint 
coherence  solution  to  the  problem  at  hand. 


USER  OPERATOR  INTERFACE 


Agents 

(Agent) 

Conversations 


(Conversation  &  ConvRule) 


Existing 

Databases 


Legacy 


JDBC 


Software  Native 
methods 


I 


MULTIAGENT  APPLICATION 


SOCIAL  MODEL 


LOCAL 

MODEL 


—  Agent  creation  (CreateAggnf) 

Agent  Operator  Interface 
( AgeiHOpInterface  &  AgentCanms) 

_ Conversation  Operator  Interface 

( ConvOplnterface  &  ConvCanvas) 


LINGUISTIC  LAYER 


COMMUNICATION 
PROTOCOL  LAYER 


INFRASTRUCTURE  (JDK  1.1) 


OPERATING  SYSTEM 


HARDWARE 


Figure  5.  JAFMAS  architecture  [38] 


1 


Requested  Resource  Provider 
( RequestedResrcProv/der 
&  Recjd Resource ) 
Conversation  Management 
(MsgRouter) 


—  Message!  Message  &  A hgOueue) 

—  Multicast 
(MuIttcastCom) 

Directed  (DlrecledComlmpI) 
implements 
DirectedCom  interface 


39 


Figure  5  shows  the  entire  JAFMAS  architecture  and  the  classes  composing  the 
different  layers.  JAFMAS  provides  sixteen  main  Java  classes  as  shown  (name  of  classes 
between  parenthesis).  Those  classes  provide  the  essential  communication,  interaction  and 
coordination  mechanisms  to  application  developers  by  dividing  the  services  provided  into 
distinct  layers. 

2.  JATLite 

JATLite  (Java  Agent  Template,  Lite)  [39]  is  a  package  of  Java  classes  and 
programs  that  allow  users  to  create  quickly  new  systems  of  software  agents  that 
communicate  over  the  Internet  in  order  to  perform  a  distributed  computation.  The  agents 
may  be  newly  created  software  or  legacy  software  "wrapped"  with  software  that 
generates  and  receives  agent  messages  as  an  integration  mechanism.  In  addition  to  code 
for  creating  agents,  JATLite  provides  a  robust  agent  infrastructure,  as  shown  in  Figure  6, 
in  which  agents  register  with  an  agent  Message  Router  (AMR),  using  a  name  and 
password,  in  order  to  be  able  to  exchange  buffered  messages  with  and  transfer  files 
between  other  agents  on  the  Internet  (some  of  which  may  be  Java  applets),  and 
connect/disconnect/reconnect  from/to  the  joint  computation.  Communication  may  be 
asynchronous  and  intermittent  agents  are  supported.  There  is  no  requirement  for 
installation  of  special  software  to  host  agents  and  no  special  host  is  assumed  for  any 
agent. 


Figure  6.  JATLite  Agent  Message  Router  [39] 
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The  most  important  service  of  JATLite  is  the  Agent  Message  Router  (AMR)  that 
allows  agents  to  fail  and  recover,  to  migrate,  and  to  be  applet-based.  The  AMR  buffers 
and  forwards  messages  like  an  email  server.  Each  agent  makes  a  single  socket  connection 
to  the  AMR  -  this  is  the  only  IP  address  each  agent  knows,  in  addition  to  its  own.  The 
AMR  then  forwards  the  message  to  the  correct  IP  address  for  the  recipient.  If  the 
recipient  agent  is  not  connected  (there  is  no  active  socket  connection),  the  message  is 
delivered  to  the  agent  when  it  does  connect  again.  The  messages  are  saved  on  the  AMR 
until  the  recipient  agent  sends  a  delete  signal.  This  simple  idea  eliminates  lost  messages 
due  to  temporary  agent  failure,  the  necessity  for  agents  to  track  IP  addresses,  and  the 
restriction  on  applet  communications,  as  there  now  need  be  only  an  AMR  on  the  server 
that  spawned  the  applet  in  order  for  it  to  exchange  messages  with  any  other  agent 
connected  to  the  AMR  [39]. 

Although  JATLite  does  provide  essential  functionality  required  for  building  a 
multi-agent  application,  it  does  not  define  a  methodology  for  specifying  the  social 
behavior  of  agents.  Moreover,  the  concept  of  the  AMR  is  inherently  centralized  in  nature. 
All  communication  must  go  through  the  AMR.  Each  time  an  agent  joins  and  leaves  the 
system,  it  has  to  inform  the  AMR.  This  can  lead  to  scalability  problems  [37]. 

3.  Aglets 

Aglets  [40]  Workbench  is  a  visual  environment  for  building  network-based 
applications  that  use  mobile  agents  to  search,  access,  and  manage  corporate  data  and 
other  information.  Aglets  are  mobile  Java  programs  which  may  travel  and  execute  in 
specialized  nodes  in  the  network.  The  Java  Aglet  Application  Programming  Interface  of 
the  framework  defines  the  methods  necessary  for  Aglet  creation,  message  handling  in  the 
network  and  initialization,  dispatching,  retraction,  deactivation/activation,  cloning  and 
disposing  of  the  Aglet. 
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Figure  7.  Aglet  Serialization  through  the  network  [37] 

The  Aglets  workbench  includes  an  Agent  Web  Launcher  named  Fiji  and  a  Visual 
Agent  Manager  named  Tahiti.  Fiji  is  a  Java  applet  based  on  the  Aglets  Framework  and 
therefore  capable  of  creating  an  Aglet  and  retracting  an  existing  Aglet  into  a  client’s  web 
browser.  Tahiti  uses  a  unique  graphical  user  interface  to  monitor  and  control  Aglets 
executing  on  a  given  computer.  It  also  implements  a  configurable  security  manager  that 
provides  a  fairly  high  degree  of  security  for  the  hosting  computer  system  and  its  owner. 
Although,  Aglet  is  more  intended  to  allow  agents  to  move  than  a  framework  for  multi¬ 
agents,  it  can  be  combined  with  JKQML  to  allow  communication  among  agents.  JKQML 
was  developed  to  provide  a  framework  and  API  for  constructing  Java-based,  KQML- 
speaking  software  agents  that  communicate  over  the  Internet. 

Aglets  Workbench  is  a  very  versatile  tool  for  creating  secure  mobile  agent-based 
applications.  However,  it  does  not  deal  with  the  important  issue  of  implementing 
coordination,  cooperation  and  coherence  in  agent-based  applications.  Aglets  can  only 
engage  in  directed  communication  as  they  use  the  TCP/IP  protocol.  With  the  new 
features  of  JKQML,  Aglets  can  be  an  adequate  tool  for  some  mobility  if  it  is  needed 
inside  the  agent  community.  Another  advantage  of  Aglets  is  that  it  is  quite  simple  to  use 
the  API,  it  goes  pretty  fast  to  get  something  done,  and  a  very  nice  user-interface  is 
provide  to  control  the  agents  [37]. 
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4.  Concordia 

Concordia  [41]  is  a  Java-based  framework  for  development  and  management  of 
network-efficient  mobile  agent  applications  for  accessing  information  anytime, 
anywhere,  and  on  any  device  (see  Fig.  8).  Concordia  offers  a  flexible  scheme  for 
dynamic  invocation  of  arbitrary  method  entry  points  within  a  common  agent  application. 
It  provides  support  for  agent  persistence  and  recovery  and  guarantees  the  transmission  of 
agents  across  a  network.  Concordia  was  designed  to  provide  fairly  complete  security 
coverage  from  the  outset. 

Within  Concordia,  an  agent’s  travel  plans  are  specified  by  its  Itinerary.  The 
Itinerary  is  a  completely  separate  data  structure  from  the  agent  itself.  Concordia  provides 
two  forms  of  asynchronous  distributed  events:  selected  events  and  group-oriented  events. 
The  event  selection  paradigm  enables  agents  to  define  the  types  of  events  they  wish  to 
receive.  In  contrast,  group-oriented  events  are  distributed  to  a  collection  of  agents  (known 
as  an  event  group)  without  any  selection. 


Although  Concordia  provides  a  useful  set  of  services  for  implementing  agent 
mobility,  security,  persistence  and  transmission,  it  does  not  provide  any  methodology  to 


43 


specify  how  agents  in  a  multi-agent  system  coordinate  cooperate  and  negotiate  to  bring 
about  a  coherent  solution.  Emphasis  here  is  on  the  communication  aspect  in  an  agent- 
based  application.  Moreover,  the  fact  that  the  agent  Itinerary  is  outside  the  agent  implies 
that  where  the  agent  travels  is  maintained  in  a  separate  logical  location  regarding  the 
place  where  the  agent  lives.  This  results  in  Concordia  agents  not  being  totally 
autonomous  [37]. 

5.  Odyssey 

Odyssey  is  an  agent  system  implemented  as  a  set  of  Java  class  libraries  that 
provide  support  for  developing  distributed  mobile  applications.  Odyssey  technology 
implements  the  concepts  of  places  and  agents.  It  models  a  network  of  computers, 
however  large,  as  a  collection  of  places.  A  place  offers  a  service  to  the  mobile  agents  that 
enter  it.  A  communicating  application  is  modeled  as  a  collection  of  agents.  Each  agent 
occupies  a  particular  place.  However,  an  agent  can  move  from  one  place  to  another,  thus 
occupying  different  places  at  different  times.  Agents  are  independent  in  that  their 
procedures  are  performed  concurrently.  Odyssey  provides  Java  classes  for  mobile  agents 
and  stationary  places. 


On  Mainframe 

Figure  9.  Typical  Odyssey  application  [37] 


6.  Voyager 

Voyager  [42]  is  a  Java-based  agent-enhanced  Object  Request  Broker  (ORB).  It 
allows  Java  programmers  to  quickly  and  easily  create  sophisticated  network  applications 
using  both  traditional  and  agent-enhanced  distributed  programming  techniques.  It 
provides  for  creation  of  both  autonomous  mobile  agents  and  objects.  Voyager  agents 
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roam  a  network  and  continue  to  execute  as  they  move.  Voyager  can  remotely  construct 
and  communicate  with  any  Java  class,  even  third  party  libraries,  without  source.  It  allows 
seamless  support  for  object  mobility.  Once  created,  any  serializable  object  can  be  moved 
to  a  new  location,  even  while  the  object  is  receiving  messages.  Messages  sent  to  the  old 
location  are  automatically  forwarded  to  the  new  location. 

Universal  Architecture 


Figure  10.  Voyager’s  universal  architecture  [42] 

Voyager  is  a  very  efficient  tool  for  constructing  agent-based  distributed 
applications.  However,  it  does  not  provide  any  classes  for  defining  the  social  behavior  of 
agents,  does  not  support  broadcast  communication  and  speech-act  messaging,  and  lacks 
in  security. 

7.  I A  Factory 

The  goal  of  the  IA  Factory  [43]  is  to  supply  the  programmer  with  an  Application 
Program  Interface  (API)  that  precludes  him  from  having  to  go  through  the  entire  network 
programming  and  debugging.  This  framework  provides  a  generic  agent  that  allows  a 
programmer  to  extend  its  specific  behavior.  In  the  simplest  case,  a  table  of  behavior  is 
sufficient.  When  an  agent  requires  greater  complexity,  programmers  can  extend  a  class  to 

give  an  agent  complex  and  interesting  behavior. 
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The  IA  Factory  creates  a  set  of  agents,  along  with  an  interface  to  run  them.  These 
agents  communicate  via  the  KQML  language  (there  is  also  another  version  of  the  library 
for  commercial  use  that  uses  XML  messages).  The  agents  are  lightweight,  which  means 
that  hundreds  of  agents  can  run  within  one  Java  Virtual  machine.  The  source  code  to  the 
agents  is  generated  in  Java;  hence,  they  can  be  customized  to  increase  the  functionality  of 
intelligent  agents. 

8.  RETSINA 

RETSINA  (Reusable  Environment  for  Task  Structured  Intelligent  Network 
Agents)  [44]  is  an  open  multi-agent  system  (MAS)  that  supports  communities  of 
heterogeneous  agents.  The  RETSINA  system  has  been  implemented  on  the  premise  that 
agents  in  a  system  should  form  a  community  of  peers  that  engage  in  peer  to  peer 
interactions.  Any  coordination  structure  in  the  community  of  agents  should  emerge  from 
the  relations  between  agents,  rather  than  as  a  result  of  the  imposed  constraints  of  the 
infrastructure  itself.  In  accordance  with  this  premise,  RETSINA  does  not  employ 
centralized  control  within  the  MAS;  rather,  it  implements  distributed  infrastructural 
services  that  facilitate  the  interactions  between  agents,  as  opposed  to  managing  them. 

Following  is  a  graphical  representation  of  the  RETSINA  MAS: 
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Figure  1 1 .  RETSINA  MAS  [44] 
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The  RETSINA  framework  is  being  used  to  develop  distributed  collections  of 
intelligent  software  agents  that  cooperate  asynchronously  to  perform  goal-directed 
information  retrieval  and  infonnation  integration  in  support  of  performing  a  variety  of 
decision-making  tasks.  A  collection  of  RETSINA  agents  forms  an  open  society  of 
reusable  agents  that  self-organize  and  cooperate  in  response  to  task  requirements.  Their 
designer  focused  on  three  crucial  characteristics  of  the  overall  framework  that 
differentiate  RETSINA  from  others: 

•  Use  of  a  multi-agent  system  where  the  agents  operate  asynchronously  and 
collaborate  with  each  other  and  their  user(s) 

•  Agents  actively  seek  out  infonnation 

•  Infonnation  gathering  is  seamlessly  integrated  with  problem  solving  and 
decision  support 

The  RETSINA  functional  architecture  consists  of  four  basic  agent  types: 

1.  Interface  agents  -  interact  with  users,  receive  user  input,  and  display 
results. 

2.  Task  agents  -  help  users  perform  tasks,  formulate  problem-solving  plans 
and  cany  out  these  plans  by  coordinating  and  exchanging  infonnation 
with  other  software  agents. 

3.  Information  agents  -  provide  intelligent  access  to  a  heterogeneous 
collection  of  information  sources. 

4.  Middle  agents  -  help  match  agents  that  request  services  with  agents  that 
provide  services. 

RETSINA  addresses  the  problem  of  how  to  facilitate  communication  among 
agents  of  different  types.  As  part  of  the  RETSINA  infrastructure  of  reusable  agents, 
middle  agents  represent  an  important  step  in  our  ongoing  effort  to  provide  a  foundation 
that  will  allow  heterogeneous  agent  types  and  architectures  to  interoperate  successfully. 
Each  RETSINA  agent  has  four  reusable  modules  for  communicating,  planning, 
scheduling,  and  monitoring  the  execution  of  tasks  and  requests  from  other  agents. 

a.  The  Communication  and  Coordination  module  accepts  and  interprets 
messages  and  requests  from  other  agents. 

b.  The  Planning  module  takes  as  input  a  set  of  goals  and  produces  a  plan 
that  satisfies  the  goals. 

c.  The  Scheduling  module  uses  the  task  structure  created  by  the  planning 
module  to  order  the  tasks. 
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d.  The  Execution  module  monitors  this  process  and  ensures  that  actions  are 
carried  out  in  accordance  with  computational  and  other  constraints. 

Below  is  a  graphic  representation  of  the  RETSINA  agent  architecture: 
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Figure  12.  RESTINA  agent  architecture 

The  set  of  Java  classes  within  RESTINA  emphasized  on  how  agents  can  find 
information  or  advertise  their  capabilities.  No  particular  attention  is  given  regarding  the 
behavior  of  each  agent  and  how  they  interact  with  other  agents.  The  Name  Server  API 
uses  a  centralized  approach,  making  the  system  less  scalable  and  fault-tolerant.  As 
RETSINA  is  an  open  system,  any  agent  on  the  Internet  can  communicate  and  interact 
with  the  actual  community  [37]. 

9.  MAST 

The  MAST  (Multi-Agent  System  Tool)  is  a  heterogeneous,  multi-agent,  general- 

purpose  MAS.  It  employs  a  decentralized  model  of  control  and  consists  of  two  basic 

types  of  entities:  agents,  defined  as  autonomous  entities  that  may  carry  out  specific  tasks 

by  themselves  or  in  conjunction  with  other  entities,  and  the  network  through  which  they 

interact.  MAST  agents  consist  of  a  structured  set  of  elements  that  include  services,  goals, 
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resources,  internal  objects,  and  control.  Control  in  MAST  is  merely  a  specification  of 
how  an  agent  handles  a  service  request.  The  network  consists  of  a  yellow  page  service 
incorporated  in  an  agent  that  facilitates  lookup  services.  MAST  is  a  loosely  coupled 
system  of  agents  that  use  the  Common  Knowledge  Representation  Language  (CKRL)  to 
communicate  and  MAST-ADL  (Agent  Description  Language)  to  describe  agents  [45]. 

The  MAST  architecture  is  a  very  complete  multi-agent  system  tool.  However,  the 
fact  that  it  is  implemented  in  C++  makes  it  less  portable  and  less  multifunctional  than 
Java-based  frameworks.  Many  of  the  services  within  MAST  (interoperability  between 
heterogeneous  agents)  are  unnecessary  because  they  are  already  included  in  the  Java 
language  [37]. 

10.  dMARS 

dMARS  is  an  agent-oriented  development  and  implementation  environment 
designed  for  building  complex,  distributed,  time-critical  systems.  It  is  intended  for  rapid 
configuration  and  ease  of  integration,  and  it  helps  with  system  design,  maintenance,  and 
reengineering.  dMARS  agents  are  designed  according  to  the  BDI  (Beliefs,  Desires,  and 
Intentions)  model.  They  are  able  to  reason  about  their  environment,  their  beliefs,  their 
goals,  and  their  intentions.  They  model  their  expertise  as  a  set  of  context-sensitive  plans. 
These  plans  can  both  react  to  changes  in  the  environment  and  proactively  pursue  the 
agent's  objectives.  Using  dMARS,  multi-agent  systems  can  be  implemented  as 
lightweight  processes  within  a  single  UNIX  process,  as  separate  UNIX  processes  on  the 
same  machine,  or  as  a  distributed  configuration  communicating  over  a  TCP/IP  network. 
Interfacing  with  other  processes  is  achieved  via  a  simple,  well-defined  communication 
protocol.  The  system  provides  comprehensive  libraries  and  components  to  support  the 
development,  implementation  and  testing  of  an  application,  therefore  minimizing  the 
need  to  develop  application-specific  support  code. 

dMARS  is  written  in  C/C++,  thus,  does  not  provide  for  true  architecture  neutrality 
and  portability.  Unlike  Java-based  systems,  applications  developed  in  dMARS  can  only 
run  on  limited  platforms  and  require  using  different  compilers  for  different  platforms.  It 
supports  only  a  limited  number  of  C++  compilers. 
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IV.  ENTERPRISE  NETWORK  MANAGEMENT  IN  THE 
DEPARTMENT  OF  DEFENSE 


The  purpose  of  this  chapter  is  to  give  the  reader  a  top-down  overview  of  the 
requirement  for  enterprise  network  management  within  the  Department  of  Defense 
(DoD).  The  first  part  provides  the  background  of  the  vision  to  transform  the  military  into 
an  integrated  enterprise-level  joint  force  that  is  the  most  technically  superior  and  capable 
in  the  world.  The  study  then  works  its  way  down  to  discuss  the  requirements  for  the 
enabling  network  enterprise  infrastructure  and  the  impact  that  it  will  have  on 
management  and  control.  The  final  part  presents  the  Anny’s  transformation  and  efforts  to 
establish  and  control  an  enterprise-level  network  infrastructure. 

A.  BACKGROUND  -  TRANSFORMATION  OF  THE  ARMED  FORCES 

In  order  to  have  the  capability  to  fight  and  win  against  the  asymmetric  threats 
employed  today,  the  United  States  recognized  the  need  to  transform  the  military. 
Asymmetric  threats  mean  that  there  is  no  clear  battle  line  between  opposing  forces;  the 
enemy  is  much  smaller,  decentralized,  clandestine,  and  wage  tactics  such  as  terrorism 
against  non-combatants,  and  guerilla  warfare.  Therefore,  DoD  devised  a  strategy  that 
leverages  state-of-the-art  technologies  to  make  the  military  a  more  effective  and  efficient 
integrated  joint  force.  This  strategy  is  embodied  in  two  linked  documents  known  as  the 
Joint  Vision  2010  and  Joint  Vision  2020. 

1.  Joint  Vision  2010/2020 

JV  2010/2020  lay  down  the  conceptual  framework  for  how  the  United  States 
Anned  Forces  will  transform  into  a  futuristic,  highly  technical,  superior  joint  force.  It 
envisions  the  development  of  a  superior  joint  force  that  is  dominant  across  the  full 
spectrum  of  military  operations  -  persuasive  in  peace,  decisive  in  war,  preeminent  in  any 
form  of  conflict  [1].  The  integration  of  core  competencies  provided  by  the  individual 
services  and  components  is  essential  to  establishing  a  superior  joint  force.  This  means 
that  the  force  must  be  fully  joint  -  intellectually,  operationally,  organizationally, 
doctrinally,  and  technically. 

The  Joint  Vision  is  characterized  by  seven  layered  concepts  that  lead  up  to  full 
spectrum  dominance  (see  Fig.  13).  The  seven  layered  concepts  are:  decision  superiority, 
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information  superiority,  network-centric  warfare  (NCW),  and  the  global  information  grid 
(GIG).  Each  of  the  respective  concepts,  starting  with  the  GIG,  provides  a  distinctive 
capability  and  is  the  foundation  that  enables  the  above  layer.  These  individual  layers  are 
the  building  blocks  that  collectively  enable  full  spectrum  dominance. 


Figure  13.  Achieving  Full  Spectrum  Dominance  [38] 

As  shown  in  Figure  13,  each  of  the  underlying  layers  is  a  foundation  that  enables 
the  above  layer.  Information  superiority  is  the  key  enabler  to  achieve  decision  superiority 
for  the  warfighter,  which  ultimately  leads  to  full  spectrum  dominance.  Further, 
information  superiority  is  enabled  by  the  NCW  concept,  which  requires  the  aggregation 
and  interoperability  of  the  stovepiped  networks,  legacy  systems,  and  applications.  The 
GIG  is  the  underlying  infrastructure  that  integrates  the  networks  and  systems  that  enables 
NCW. 

The  foundation  of  this  research  study  is  derived  from  the  envisioned  technology 
requirement  necessary  to  achieve  the  JV2010/JV2020  concepts.  A  pivotal  aspect  of  the 
transformation  entails  an  insurgence  of  state-of-the-art  technology  insertions  to  move 
from  a  platform-centric  capability  to  a  network-centric  capability,  and  ultimately  to  a 
knowledge-centric  capability  over  the  next  quarter  century.  A  platfonn-centric  capability 
focuses  on  individual  platforms  (weapons  systems,  information  systems,  etc.)  that  are 
networked  and  managed  within  centralized  intranets.  The  network-centric  capability  is 
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based  on  fusing  the  platfonn-centric  stovepipes  into  a  single,  fully  integrated,  and 
decentralized  network  for  sharing  information  across  platforms,  thus,  flattening  the 
command  and  control  hierarchy.  The  knowledge-centric  capability  will  evolve  the 
network-centric  capability  to  allow  generated  knowledge  to  be  distributed  and  shared 
throughout  the  military  enterprise. 

2.  Information  Superiority 

To  reach  full  spectrum  dominance  the  transformation  of  the  joint  force  depends 
upon  information  superiority  as  the  key  enabler.  Joint  Publication  1-02  defines 
information  superiority  as  the  capability  to  collect,  process,  and  disseminate  an 
uninterrupted  flow  of  information  while  exploiting  or  denying  an  adversary’s  ability  to  do 
the  same.  The  essence  of  information  superiority  is  further  elaborated  in  JV2020  [1]: 

Infonnation  superiority  provides  the  joint  force  a  competitive  advantage 
only  when  it  is  effectively  translated  into  superior  knowledge  and 
decisions.  The  joint  force  must  be  able  to  take  advantage  of  superior 
information  converted  to  superior  knowledge  to  achieve  “decision 
superiority”  -  better  decisions  arrived  at  and  implemented  faster  than  an 
opponent  can  react,  or  in  a  noncombat  situation,  at  a  tempo  that  allows  the 
force  to  shape  the  situation  or  react  to  changes  and  accomplish  its  mission. 

In  order  to  gain  information  superiority  within  the  context  of  full  spectrum 
dominance,  the  joint  force  must  be  fully  synergized  in  terms  of  information  and 
intelligence  sharing,  and  situational  awareness.  The  interconnecting  of  platforms  into  a 
single  shared  awareness  environment  to  achieve  information  superiority  is  the  underlying 
principle  of  network-centric  warfare. 

3.  Network  Centric  Warfare 

According  to  the  pioneers  [36],  Network  Centric  Warfare  focuses  on  the  combat 
power  that  can  be  generated  from  the  effective  linking  or  networking  of  the  warfighter 
enterprise.  It  is  characterized  by  the  ability  of  geographically  dispersed  forces  to  create  a 
high  level  of  shared  battlespace  awareness  that  can  be  exploited  via  synchronization  and 
other  network-centric  operations  to  achieve  commanders’  intent. 
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Figure  14.  Network  Centric  [3] 


The  infonnation  advantage  gained  through  the  use  of  NCW  allows  a  warfighting 
force  to  achieve  dramatically  improved  information  positions,  in  the  form  of  common 
operational  pictures  that  provide  the  basis  for  shared  situational  awareness  and 
knowledge,  and  a  resulting  increase  in  combat  power.  The  ability  to  achieve  shared 
situational  awareness  and  knowledge  among  all  elements  of  a  joint  force,  in  conjunction 
with  allied  and  coalition  partners,  is  increasingly  viewed  as  a  cornerstone  of 
transformation  to  achieve  future  warfighting  capabilities  [5]. 

One  of  the  essential  key  concepts  of  the  NCW  definition  is  the  “effective  linking 
or  networking”  among  entities  in  the  battlespace.  This  means  that  dispersed  and 
distributed  entities  can  generate  synergy,  and  that  the  responsibility  and  work  can  be 
dynamically  reallocated  to  adapt  to  the  situation.  The  effective  linking  requires  the 
establishment  of  a  robust,  high-performance  information  infrastructure,  or  infostructure, 
which  provides  all  the  elements  of  the  warfighting  enterprise  with  access  to  high  quality 
information  services  [36]. 

Establishing  a  robust,  high-perfonnance  infostructure  is  where  the  realization  of 
NCW  gets  complicated.  Establishing  the  infostructure  implies  the  integration  of 
heterogeneous,  legacy,  and  proprietary  systems,  the  establishment  of  a  single  enterprise 
network,  and  the  expansion  of  network  control.  The  military  must  figure  out  how  to 
integrate  the  disparate  systems  in  order  to  monitor  and  control  it  at  the  enterprise.  The 
gist  of  this  study  is  that  the  limitations  of  the  incumbent  NM  protocols  will  eventually 

stifle  the  migration  to  full-blown  NCW.  The  traditional  protocols  will  encounter 
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problems  with  scalability,  reliability,  flexibility,  and  adaptability  that  are  essential  to 
NCW. 

The  DoD’s  concept  to  establish  an  underlying  robust,  high-perfonnance 
infrastructure  to  effectively  link  the  entities  in  the  battlespace  is  known  as  the  Global 
Information  Grid  (GIG).  The  next  section  discusses  the  GIG  in  detail. 

4.  Global  Information  Grid 

The  idea  of  a  GIG  was  originated  as  a  result  of  growing  concerns  regarding 
interoperability  and  end-to-end  integration  of  automated  infonnation  systems, 
streamlined  management,  and  information  infrastructure  investments  within  the  military. 
However,  the  real  demand  for  a  GIG  was  driven  by  the  requirement  to  achieve  full 
spectrum  dominance  as  expressed  JV20 10/2020.  The  GIG  provides  the  enabling 
foundation  for  NCW.  The  success  of  the  GIG  will  depend  in  large  part  on  how  well  it 
helps  achieve  force-wide  information  sharing. 

The  GIG  vision  is  outlined  below  [4]: 

•  A  single  secure  grid  providing  seamless  end-to-end  capabilities  to  all 
warfighting,  national  security,  and  support  users 

•  Supporting  DoD  and  Intelligence  Community  (IC)  requirements  from 
peace  time  business  support  through  all  levels  of  conflict 

•  Joint,  high  capacity  netted  operations 

•  Fused  with  weapons  systems 

•  Supporting  strategic,  operational,  tactical,  and  base/post/camp/station 

•  Plug  and  Play  interoperability 

o  Guaranteed  for  US  and  allied 
o  Connectivity  for  coalition  users 

•  Tactical  and  functional  fusion  a  reality 

•  Information/bandwidth  on  demand 

•  Defense  in  depth  against  all  threats 
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Figure  15.  Global  Information  Grid  [47] 


On  2  May  2001,  the  official  definition  of  the  GIG  was  agreed  upon  and  published 
by  the  DoD  Chief  Information  Officer  (CIO),  the  Under  Secretary  of  Defense  (USD)  for 
Acquisition,  Technology  and  Logistics  (AT&L),  and  the  Joint  Staff/J6  [4]: 

Globally  interconnected,  end-to-end  set  of  information  capabilities, 
associated  processes,  and  personnel  for  collecting,  processing,  storing, 
disseminating,  and  managing  information  on  demand  to  warfighters, 
policy  makers,  and  support  personnel.  The  GIG  includes  all  owned  and 
leased  communications  and  computing  systems  and  services,  software 
(including  applications),  data,  security  services,  and  other  associated 
services  necessary  to  achieve  information  superiority.  It  also  includes 
National  Security  Systems  (NSS)  as  defined  in  section  5142  of  the 
Clinger-Cohen  Act  of  1996.  The  GIG  supports  all  DoD,  National 
Security,  and  related  Intelligence  Community  (IC)  missions  and  functions 
(strategic,  operational,  tactical,  and  business)  in  war  and  in  peace.  The 
GIG  provides  capabilities  from  all  operating  locations  (bases,  posts, 
camps,  stations,  facilities,  mobile  platforms,  and  deployed  sites).  The  GIG 
provides  interfaces  to  coalition,  allied,  and  non-DoD  users  and  systems. 

a.  GIG  Functions 

The  explosion  of  the  GIG  requirements  will  increase  the  demand  and 
criticality  of  network  troubleshooting,  network  management,  dynamic  bandwidth 

management,  information  and  network  protection,  and  spectrum  management  [6].  Four 
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defined  functions  that  characterize  the  information  flow  and  exchange  within  the  GIG  are 
Computing,  Communications,  Presentation,  and  Network  Operations  (NETOPS).  Each 
of  these  functions  imposes  a  set  of  complex  challenges  that  must  be  overcome  to  realize 
the  full  nature  of  the  GIG.  However,  the  scope  of  this  study  is  mainly  focused  within  the 
NETOPS  functional  area.  More  specifically,  this  study  explores  the  problems,  issues, 
and  solutions  for  the  “network  management”  aspect  of  the  NETOPS  sub-function. 

NETOPS  is  an  organizational  and  procedural  framework  used  to  monitor, 
manage,  and  control  the  GIG  by  means  of  the  sub-functions  of  Network  Management 
(NM),  Information  Dissemination  Management  (IDM),  and  Infonnation  Assurance  (IA) 
[5].  Focusing  in  on  NM,  the  GIG  NM  function  is  defined  as  is  the  capability  to  monitor, 
control  and  ensure  the  visibility  of  the  various  networking  and  internetworking 
components. 

b.  GI G  Network  Management  Capabilities  Requirements 
The  Capstone  Requirements  Document  (CRD)  [5]  for  the  GIG  defines  the 
network  management  capabilities  requirements  that  are  essential  to  effectively  monitor, 
manage,  and  control  the  GIG  as  an  enterprise  network.  The  CRD  points  out  that  network 
management  is  the  set  of  activities  that  establishes  and  maintains  the  GIG  network 
switching,  transmission,  infonnation  services,  and  computing  resources  available  to 
fulfill  users’  telecommunications  and  connectivity  needs  and  demands.  The  CRD  further 
points  out  that  the  key  GIG  NM  services  are  fault,  configuration,  account,  performance, 
and  planning  management.  The  following  paragraphs  reflect  the  detailed  capabilities 
requirements  for  network  management  as  outlined  in  the  CRD.  The  capabilities 
requirements  are  shown  in  italics  within  each  discussion  area. 

•  GIG  End-to-End  Situational  Awareness.  Network  managers,  on  behalf  of 
commanders,  must  have  real-time  knowledge  of  the  network.  This 
knowledge  must  encompass  awareness  of  all  aspects  of  the  network, 
including  all  network  assets,  their  physical  location,  and  their  logical 
relationship  within  the  network.  To  accomplish  GIG  end-to-end 
situational  awareness,  systems  shall  have  the  NM  capability  of 
automatically  generating  and  providing  an  integrated/correlated 

presentation  of  networks  and  all  associated  network  assets. 
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•  Dynamic,  Predictive  Planning.  Systems  shall  have  the  NM  capability  to 
perform  dynamic,  predictive  planning  by  gathering,  storing  and  using 
knowledge  about  GIG  assets/resources,  so  as  to  optimize  their  utilization. 
Knowing  equipment  types  and  quantities  available  to  support  an  operation 
is  imperative  for  GIG  utilization  planners.  Initially,  a  database  must  be 
defined  and  populated  with  organizations  and  their  known  GIG 
assets/resources.  Once  defined  and  populated,  the  database  should  have 
the  capability  to  be  modified,  as  required,  to  support  changing  mission 
requirements  to  include  activation/deactivation.  The  network  management 
system  should  include  network  design  and  engineering  functions  that 
account  for  all  voice,  video,  and  data  networks  that  could  comprise  a 
proposed  system,  including  commercial  technology.  These  functions 
should  include  automated  mapping  of  network  topology  ;  measurement  and 
recording  of  traffic  flow  data;  trend  analysis;  spectrum  planning  and 
management;  propagation  analysis;  electromagnetic  resolution;  and 
electronic  key  management.  A  modeling  and  simulation  capability  should 
be  provided  to  allow  a  planner  to  assess  the  impact  of  changes  to  a  system 
or  network,  without  interrupting  the  operational  network.  Systems  shall 
have  the  NM  capability  to  create/modify/distribute  GIG  network  plans  and 
orders  in  accordance  with  user  requirements. 

•  Distributed  and  Partitioned  Network  Control.  Systems  shall  have  the  NM 
capability  to  transfer  control  rapidly  of  one  or  more  objects  or  groups  of 
varying  size,  and  reestablish  control  when  relinquished  without  hindering 
end-to-end  visibility  by  the  senior  network  manager,  while  maintaining 
continuous  control.  Only  one  designated  active  manager  for  a  network 
object  should  be  permitted  at  any  given  time.  However,  oversight  of 
managers  of  network  objects  may  shift  as  forces/assets  are  apportioned, 
allocated,  or  assigned  without  requiring  a  change  of  the  active  manager. 

•  Remote  Object  and  Network  Control  and  Configuration.  Network 
managers  must  be  able  to  monitor,  configure,  and  control  all  aspects  of  the 
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network  and  observe  changes  in  network  status.  Networks  comprising  the 
GIG  are  evolutionary  in  nature  and  generally  are  comprised  of  both  legacy 
and  emerging  systems,  some  with  their  own  management  systems. 
Systems  shall  have  a  NM  capability  that  leverages  existing  and  evolving 
technologies  and  has  the  ability  to  perform  remote  network  device 
configuration/reconfiguration  of  objects  that  have  existing  DoD  JTA 
management  capabilities. 

•  Network  Status.  Components  of  the  GIG  provide  metrics  to  network 
managers  to  allow  them  to  make  decisions  on  managing  the  network. 
Systems  shall  have  an  automated  NM  capability  to  obtain  the  status  of 
networks  and  associated  assets  in  near  real  time  99%  (Threshold,  Key 
Performance  Parameter  -  KPP)  and  99.9%  (Objective,  KPP)  of  the  time. 

•  Automated  Fault  Management.  Systems  shall  have  the  NM  capability  to 
perform  automated  fault  management  of  the  network,  to  include  problem 
detection,  fault  isolation  and  diagnosis,  problem  tracking  until  corrective 
actions  are  completed,  and  historical  archiving.  This  capability  allows 
network  managers  automatically  to  monitor  and  maintain  the  situational 
awareness  of  the  network’s  manageable  devices,  and  to  become  aware  of 
network  problems  as  they  occur  based  on  the  trouble  tickets  generated 
automatically  by  the  affected  object  or  network.  Alarms  will  be  correlated 
to  eliminate  those  that  are  duplicates  or  false,  initiate  test,  and  perform 
diagnostics  to  isolate  faults  to  a  replaceable  component. 

The  military  faces  many,  many  challenges  ahead  in  achieving  the 
aforementioned  network  management  capabilities  requirements.  There  are  a  number  of 
obstacles,  such  as  scalability  issues,  that  will  require  innovative  solutions  in  order  to 
accomplish  these  requirements  and  bring  the  GIG  to  fruition  at  a  cost  that  is  affordable. 
c.  GIG  Network  Management  Shortcomings 

The  military  is  inundated  with  an  overwhelming  number  of  disparate 
information  systems,  applications,  and  network  architectures.  This  dilemma  has  plagued 
the  military  for  years  due  to  the  lack  establishing  a  single  standard.  The  effort  to 
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integrate  these  systems  into  a  coherent  GIG  will  by  no  means  be  a  simple  endeavor.  An 
even  greater  issue  is  finding  or  developing  a  single  network  management  platform  that 
can  manage  and  control  these  incompatible  systems  and  display  a  network  common 
operational  picture  (NETCOP)  to  maintain  situational  awareness  of  the  GIG.  The  CRD 
identifies  a  number  of  critical  network  management  shortcomings  that  can  impede  the 
GIG  implementation.  These  findings  are  consistent  with  the  shortcomings  discussed  in 
Chapter  II,  and  reinforce  the  need  for  a  more  sophisticated  enterprise  NM  solution: 

•  There  is  a  lack  of  asset  visibility  resulting  in  an  inability  to  effectively 

manage  the  overall  network  to  support  common  user  needs.  The  limited 
network  visibility  is  significantly  impacted  by  the  large  number  of 
stovepiped  and  legacy  systems.  Stovepiped  and  legacy  systems  are 
normally  not  designed  to  support  global,  end-to-end  network  management 
or  adhere  to  a  prescribed  set  of  standards  for  interoperable  use  across  DoD 
and  the  Intelligence  Community.  However,  there  are 

dedicated/specialized  systems  that  are  required  to  accomplish  specific 
command  missions,  but  do  not  support  or  facilitate  effective  network 
management  of  these  systems. 

•  There  are  no  common  prescribed  standards  for  common  user 
systems/networks  that  would  facilitate  network  management  across 
DoD/IC.  This  shortfall  precludes  effective  network  management,  which  is 
essential  to  ensure  the  most  efficient  and  effective  exchange  of 
information  across  the  battlespace. 

•  There  is  no  distributed  network  management  capability  that  would  allow 
the  management  of  common  user  networks  from  more  than  just  one 
central  location. 

•  Existing  network  management  is  currently  unable  to  provide  a  fully 
integrated  multi-level  security  network. 
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•  DoD  has  little  or  no  network  management  capability  to  accompany  its 
increasingly  widespread  use  and  application  of  advanced  mobile  wireless 
computing  and  networking  which  are  inherently  ad  hoc. 

•  There  is  no  prescribed  standard  joint  network  management  capability  for 
JTF  component-level  common  user  systems/networks.  Deployed  network 
management  suffers  from  a  loosely  federated  approach  for  employing 
government  unique  (formerly  government-off-the-shelf  (GOTS))  and 
COTS  software. 

•  Current  end-to-end  communications,  especially  in  the  last  tactical  mile, 
are  not  fully  integrated  and  interoperable.  Specific  issues  include 
heterogeneous  network  design,  inconsistent  firewall  implementations  and 
varying  network  management  policies  and  tools. 

This  list  of  shortcomings  clearly  identifies  the  complexities  that  the 
military  will  encounter  in  the  pursuit  of  the  GIG.  Attempting  to  standardize  the  systems 
across  the  enterprise  is  unrealistic  and  too  costly.  Additionally,  even  with  standardization 
there  will  still  be  such  issues  as  scalability.  Therefore,  the  military  will  have  to  look 
beyond  the  current  technologies  for  viable  solutions.  The  CRD  offers  no  suggested 
solution,  although  that  is  not  the  intent  of  the  CRD  anyhow. 
d.  GIG  Challenge 

As  noted  in  the  shortcomings  above,  as  well  as  the  numerous  complexities 
presented  in  Chapter  II,  realizing  the  full  implementation  of  the  GIG  is  much  easier  said 
than  done.  Within  the  purview  of  this  study,  the  SNMP  standard  required  by  the  JTA 
currently  lacks  the  robustness,  scalability,  reliability,  flexibility,  and  adaptability 
necessary  to  meet  the  requirements  as  outlined  above.  However,  it  is  important  to 
understand  that  the  NM  challenge  of  the  GIG  extends  beyond  the  arguments  made  in  this 
study.  For  example,  not  all  desirable  managed  devices  have  standard  MIBs  embedded  in 
them.  Devices  such  as  personal  computers,  personal  digital  assistants  (PDAs),  and  mobile 
devices  don’t  usually  have  MIB  structures  loaded.  Small,  lightweight  devices  such  as 
PDAs  are  resource  restrictive  and  therefore  in  many  cases  manufacturers  prefer  not  to 
embed  them.  However,  there  is  nothing  to  preclude  them  from  being  loaded  [28], 
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Bordetsky  and  Dolk  [46]  address  another  important  GIG  challenge 
regarding  the  complexities  inherent  in  the  emerging  wireless  management  that  is  also 
encumbered  within  the  GIG.  Relative  to  this  study,  they  discuss  the  issue  of  managing  an 
increasing  number  of  SNMP  MIB  objects.  They  present  a  scenario  regarding  the 
magnitude  of  the  one  million  customer  base  wireless  only  web  users  of  the  Sprint  PCS 
system: 


In  January  2001,  Sprint  PCS  announced  that  its  customer  base  for  wireless 
only  web  users  reached  the  one  million  mark.  With  Palm  Pilots  rapidly 
merging  wireless  services,  we  can  picture  each  emerging  mobile  user  as  a 
node  with  2-5  terminal  devices.  Typically  each  mobile  terminal  appearing 
in  the  Mobile  Switching  Center  Registries  requires  on  the  order  of  20 
objects,  so  an  approximation  of  the  complexity  of  a  network  comparable 
to  Sprint  PCS  would  range  from  40  -  100  million  objects.  And  this  is  only 
one  of  the  four  wireless  technologies. 

This  scenario  underscores  another  difficult  challenge  for  GIG 
implementation  using  traditional  standards  as  elaborated  in  Chapter  II.  The  military  must 
consider  the  diverse  technologies  such  as  the  emerging  wireless  technologies  and  figure 
out  how  to  converge  them  as  well  as  managing  the  magnitude  of  management  objects. 
Bordetsky  and  Dolk  offer  a  Knowledge  Management  solution  found  in  [46]. 
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V.  ENTERPRISE  NETWORK  MANAGEMENT  IN  THE  ARMY 


This  chapter  provides  an  overview  of  the  Army’s  plan  to  transform  itself  in 
accordance  with  JV2010/2020.  The  chapter  focuses  on  the  Army’s  enterprise  network 
management  approach  to  give  some  insight  on  how  the  Army  plans  to  operate  and 
implement  it.  This  gives  the  reader  a  feel  for  the  inherent  complexities  and  what  is 
required  for  enterprise-level  network  management  in  the  Army. 

A.  BACKGROUND 

In  line  with  the  full  spectrum  dominance  concept  in  JV20 10/2020,  the  Army 
developed  a  strategy  called  “Army  Knowledge  Management”  (AKM)  to  transform  itself 
into  a  network-centric,  knowledge-based,  Internet  age  enterprise  force.  Conceptually,  it 
will  improve  information  access  and  sharing,  while  providing  the  infonnation 
infrastructure  (infostructure)  capabilities  across  the  Army.  AKM  is  intended  to  improve 
decision  dominance  by  dramatically  enhancing  the  warfighter’s  ability  to  distribute, 
process,  fuse,  and  correlate  unprecedented  amounts  of  actionable  data  into  information  - 
securely,  reliably,  and  quickly  enough  to  enable  leaders  to  synchronize  and  mass  effects 
for  decisive  results  [22]. 

The  AKM  strategy  consists  of  five  goals: 

1 .  Leverage  all  information  technology  capabilities  of  the  Department 
of  Defense  and  the  Anny  at  the  enterprise-level  to  reduce  the  total 
cost  of  ownership. 

2.  Integrate  best  business  practices  to  promote  Army  transformation 
to  a  net-centric,  knowledge-based  force. 

3.  Manage  the  information  technology  infrastructure  as  an  Army 
enterprise. 

4.  Implement  a  web-based  enterprise  knowledge  portal  to  provide 
universal,  secure  access  for  the  entire  Army. 

5.  Harness  the  human  capital  of  our  Army  IT  workforce  through 
effective  recruitment,  training,  and  retention  of  our  soldiers  and 
civilians. 
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In  accordance  with  AKM  goal  3,  the  Secretary  of  the  Anny  published  the  “AKM 
implementing  guidance  Goal  3,”  dated  18  September  2001,  directing  the  Army  to 
establish  the  Network  Enterprise  Technology  Command/9th  Anny  Signal  Command 
(NETCOM)  to  act  as  the  single  Anny  network  operator  and  defender.  NETCOM  is  the 
Anny's  single  authority  to  establish,  operate,  and  manage  the  Army  enterprise 
infostructure  (AEI).  NETCOM  has  technical  command  and  control  and  configuration 
management  authority  for  the  Anny's  critical  networks  and  systems,  and  will  have 
operational  review/coordination  authority  for  any  standards,  system,  architecture,  design, 
or  device  that  impacts  enterprise-level  Army  infostructure  and  Network  Operations 
(NETOPS)  [24],  Accordingly,  NETCOM  has  assumed  technical  control  of  all  Army 
networks  -  Active,  Guard,  and  Reserve. 

B.  ARMY  ENTERPRISE  INFOSTRUCTURE 

The  Army  Enterprise  Infostructure  (AEI)  [25]  is  the  Army’s  portion  of  the  Global 
Information  Grid  (GIG).  The  AEI  is  the  underlying  backbone  that  will  enable  network¬ 
centric  warfare  (NCW)  capabilities  in  the  Army.  Physically,  the  AEI  is  the  shared 
computers,  ancillary  equipment,  software,  firmware,  hardware,  services,  people,  business 
processes,  facilities  and  related  resources  used  in  the  acquisition,  storage,  manipulation, 
protection,  management,  movement,  control,  display,  switching,  interchange, 
transmission,  or  reception  of  all  types  of  data  or  infonnation  in  any  format,  including 
audio,  video,  imagery,  voice,  or  data. 

The  AEI  includes  the  sustaining  base  (posts,  camps,  and  stations)  with  Wide, 
Metropolitan,  Campus,  and  Local  Area  Networks  (WAN/MAN/CAN/LAN)  that  extends 
from  the  sustaining  base  to  the  tactical  environment.  The  AEI  is  not  one  contiguous 
network;  it  consists  of  those  portions  of  the  GIG  that  the  Army  is  responsible  for 
operating  and  managing.  It  encompasses  the  Command,  Control,  Communications  and 
Information  Management  (C4IM)  platfonns  and  services  supporting  Army  users,  both  in 
pennanent  stations  and  deployed.  This  includes  the  Active  Army,  the  Army  Reserve  and 
the  Army  National  Guard. 

The  scope  of  the  AEI  encompasses  Army  information  services  worldwide  access 
to  the  GIG.  This  includes  the  Top  Secret/Sensitive  Compartmented  Information  (TS/SCI) 

domain  and/or  networks,  Army  Secret  security  domain,  Anny  Sensitive  but  Unclassified 
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security  domain,  Army  public  information  sites  on  the  Internet;  and  a  variety  of 
telephony  systems  (e.g.,  Defense  Red  Switch  Network  and  Defense  Switched  Network). 
The  AEI  includes  core  services  such  as  email,  web,  fde  and  print  servers,  directories, 
Army  Knowledge  Online  (AKO),  Public  Key  Infrastructure  (PKI)/  Common  Access  Card 
(CAC),  and  Anny  enterprise  applications  such  as  personnel  and  logistics.  Figure  16 
illustrates  the  physical  scope  of  the  AEI. 


Figure  16.  Army  Enterprise  Infostructure  [25] 

The  current  Army  sustaining  base  operating  environment  consists  of  a  variety  of 
geographically  dispersed  small,  medium,  and  large  installations  and  user  sites  (e.g., 
Reserve  Component  Centers,  Armories,  and  Recruiting  stations),  in  the  Continental 
United  States  (CONUS)  and  outside  the  Continental  United  States  (OCONUS).  Small 
installations  are  characterized  by  having  less  than  5,000  users,  medium  installations  with 
5,000  -  15,000  users  and  large  installations  having  in  excess  of  15,000  users.  These 
installations  host  several  IT  facilities  providing  support  to  not  only  Army  users  but  also  to 
other  Service/Agency  tenants.  In  addition,  the  Army  has  many  mobile  users  that  require 
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access  to  the  AEI  from  outside  the  enclave.  Thus,  multiple  separate  communities  of 
interest  with  varied  IT  requirements  are  found  on  most  Army  installations. 

Under  NETCOM,  many  disparate  operations  will  be  transitioned  to  consolidated 
operations  and  management.  There  are  large  headquarters  complexes,  such  as 
Department  of  the  Army  headquarters  in  the  Pentagon.  There  are  several  commands  that 
have  facilities  located  around  the  world  in  remote  locations,  such  as  the  Army  Corps  of 
Engineers,  and  Space  and  Missile  Defense  Command.  Some  organizations  have  activities 
and  isolated  users  located  in  other  government  and  commercially  leased  facilities 
throughout  the  CONUS  (Continental  United  States)  and  in  some  cases  overseas 
(OCONUS). 

Deployed  Army  units  access  the  GIG  through  Standard  Tactical  Entry  Points 
(STEP),  commercial  satellite,  and  terrestrial  communications  links,  all  managed  and 
operated  by  the  Defense  Information  Systems  Agency  (DISA).  Deployed  units  operate 
and  maintain  internal  networks  that  provide  data,  voice,  imagery,  and  video  support. 
These  internal  networks  are  referred  to  as  the  Tactical  Internet  (TI)  and  currently  consist 
of  Tri-Services  Tactical  Equipment,  Mobile  Subscriber  Equipment,  Tactical  Satellite,  and 
the  Enhanced  Position  Location  Reporting  System.  In  the  future,  the  TI  will  be  replaced 
by  the  Warfighter  Information  Network  -Tactical  and  the  Joint  Tactical  Radio  System. 

C.  NETOPS 

The  NETOPS  [25]  concept  is  an  organizational,  procedural,  and  technological 
construct  for  ensuring  information  superiority  and  enabling  speed  of  command  for  the 
warfighter.  It  is  defined  as  the  operation  and  management  of  the  AEI  -  the  organizations, 
procedures,  and  technologies  required  to  monitor  manage,  coordinate,  and  control  the 
AEI  as  the  Army  portion  of  the  GIG.  NETOPS  links  together  widely  dispersed  network 
operations  centers  (NOCs)  through  a  command  and  organizational  relationship.  It 
establishes  joint  tactics,  techniques,  and  procedures  to  ensure  a  joint  procedural  construct, 
and  establishes  a  technical  framework  in  order  to  create  a  Network  Common  Operational 
Picture  (NETCOP). 
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1.  NETOPS  Goals 

Below  are  the  established  Army  goals  for  NETOPS: 

•  Enable  universal  (and  secure)  access  to  authorized  infostructure  services 
to  all  Army  customers  within  the  Army  infostructure  -  secure  single  sign- 
on  "plug  &  play"  capability 

•  Accurately  display  a  total  and  integrated  Situation  Awareness  of  the  AEI 

•  Predict  impacts  on  the  AEI  of  new/changed  systems  and  operational 
contingencies 

•  Redirect  and  reallocate  AEI  resources  in  near  real-time  to  support  Army 
response  to  crisis  or  unplanned  event  anywhere  within  the  Army 
infostructure  Operational  Area  (AOR) 

•  Provide  a  consistent,  robust,  base-level  of  infostructure  services  to  all 
authorized  Army  customers  at  the  least  cost  feasible  within  Army 
operational  constraints 

•  Provide  additional  (above  base  level)  infostructure  services  to  Anny 
customers  on  a  reimbursable  basis 

•  Perform  continuing  and  non-intrusive  technology  insertion  to  improve 
service  levels  or  reduce  cost  of  providing  current  base-level  services 

•  Provide  Continuity  of  Operations  Plan  capabilities 

In  order  to  achieve  these  goals,  the  Army  has  set  out  to  standardize  and 
consolidate  all  the  network  operations  across  the  infrastructure  at  the  enterprise.  By 
taking  this  approach  the  Anny  intends  to  increase  the  quality  of  service  provided  to  the 
end-users,  better  utilize  personnel  required  to  perfonn  operational  tasks,  and  minimize 
the  total  cost  of  providing  infostructure  services.  However,  consolidation  of  routine 
functions  causes  the  physical  execution  of  those  functions  to  move  away  from  the 
physical  proximity  of  the  supported  users.  Therefore,  the  Army  intends  to  establish 
comprehensive  remote  management  capabilities  by  maintaining  consolidated  support 
areas  at  the  installation  level.  The  goal  here  is  to  maintain  a  high  level  of  IT  support  and 
service  at  the  installation  but  to  reduce  the  inherent  redundancies  and  inefficiencies  of  the 
current  structure.  In  the  future,  the  consolidated  support  will  gradually  shift  to  a  level 
above  installation  and  then  ultimately  to  the  enterprise-level. 

2.  Systems  &  Network  Management 

The  NETOPS  concept  is  composed  of  three  integrated  mission  areas: 
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•  Systems  &  Network  Management  (S&NM) 

•  Information  Dissemination  Management  (IDM) 

•  Information  Assurance  (IA) 

Together  these  mission  areas  facilitate  the  implementation  of  “Service 
Assurance.”  This  approach  provides  “Assured  Network  Availability”,  “Assured 
Information  Protection”,  and  “Assured  Information  Delivery”  at  the  strategic, 
operational,  and  tactical  levels  through  a  co-evolution  of  doctrine,  processes,  and 
technology  [25].  Figure  17  depicts  the  NETOPS  mission  areas  and  functions. 
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Figure  17.  NETOPS  Mission  Areas  and  Functions  [25] 


Systems  &  Network  Management  is  the  management  of  the  network  and  the 
devices  connected  to  the  network.  It  is  the  sum  of  three  management  areas:  Network 
Management  (to  include  devices,  servers,  storage  devices,  and  end-user  devices  like 
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printers,  workstations,  laptops,  and  handheld  computers),  Satellite  Communications 
(SATCOM)  Management,  and  Frequency  Spectrum  Management. 

The  NETOPS  concept  was  designed  to  account  for  the  basic  network 
management  (NM)  functions  of  fault,  configuration,  accounting,  performance,  and 
security  management  (FCAPS).  This  includes  systems  and  applications  management  and 
comprises  all  the  measures  necessary  to  ensure  the  effective  and  efficient  operations  of 
networked  systems.  Network  Management  is  the  only  area  among  the  NETOPS  function 
management  areas  that  is  germane  to  this  study,  hence,  it  is  the  only  area  covered. 

3.  Army  Network  Operations  and  Security  Center 

The  Army  Network  Operations  and  Security  Center  (ANOSC)  is  NETCOM’s 
central  agency  responsible  for  executing  enterprise  network  operations  and  defense.  The 
ANOSC  provides  worldwide  operational  and  technical  support  for  the  AEI.  It  is 
responsible  for  reporting  AEI  situational  awareness  to  the  Army  command  and  DoD 
NETOPS.  Additionally,  on  behalf  of  NETCOM,  ANOSC  interfaces  with  all  internal  and 
external  NOSC  for  AEI  coordination. 

The  ANOSC  oversees  subordinate  NOSCs  that  provide  distributed  networks 
operations  and  security  within  specific  geographical  area  of  responsibilities  (AOR). 
Currently,  each  Army  theater  AOR  has  a  Theater  NOSC  (TNOSC),  including  one  in 
CONUS  (C-TNOSC).  However,  the  intent  is  to  move  all  NOSC  and  NOSC-like  facilities 
under  NETCOM  operations  and  management  as  part  of  NETCOM’s  single  authority  for 
operations  and  management  function.  The  TNOSC  acts  as  a  single  point  of  contact  for 
Army  network  services,  operational  status,  and  anomalies  in  their  respective  theater 
AOR.  It  is  also  responsible  for  providing  visibility  and  status  information  to  the  ANOSC 
and  DISA  Regional  NOSCs.  Network  operations  and  security  responsibilities  are  further 
decentralized  within  the  TNOSCs  (see  Fig.  18). 
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ANOSC  -  Army  Network  Operations  and  Security  Center 
TNOSC-  Theater  Network  Operations  and  Security  Center 
RNOSC  -  Regional  Network  Operations  and  Security  Center 
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NSC  -  Network  Service  Center 

STARC  -  State  Area  Regional  Command  (ARNG) 

DOIM  -  Director  of  Information  Management 


Figure  18.  ANOSC  and  NOSC  Relationships  [25] 


D.  NETWORK  COMMON  OPERATIONAL  PICTURE 

A  key  aspect  of  enterprise  network  management  is  the  ability  for  the  Army  to 
have  comprehensive  situational  awareness  of  the  AEI.  As  part  of  the  GIG  initiative,  the 
Deputy  Secretary  of  Defense  directed  all  of  the  services  to  create  and  maintain  a  Network 
Common  Operational  Picture  (NETCOP)  [25]. 

The  NETCOP  is  an  integrated  capability  that  receives,  correlates,  and  displays  a 
view  of  voice,  video,  and  data  telecommunications  networks,  systems,  and  applications  at 
the  installation/tactical,  region,  theater,  and  global  levels  through  the 
installations/deployed  tactical  forces,  Network  Service  Centers  (NSCs),  TNOSCs,  and 
ANOSC  respectively.  At  each  level  the  NETCOP  reflects  status,  performance,  and 
information  assurance.  At  a  minimum,  the  NETCOP  requires:  telecommunications, 
system,  and  application  fault  and  performance  status;  and  significant  information 
assurance  reports  such  as  network  intrusions  or  attacks. 
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Figure  19.  Dissemination  of  NETCOP  Information  [25] 

The  NETCOP  provides  the  ability  for  Combatant  Commanders,  Service 
Components,  Sub-unified  Commands,  Joint  Task  Forces  (JTFs),  and  deployed  forces  to 
rapidly  identify  outages  and  degradations,  network  attacks,  mission  impacts,  C4 
(Command,  Control,  Communications  and  Computers)  shortfalls,  operational 
requirements,  and  problem  resolutions  at  the  strategic,  operational,  and  tactical  levels. 
Figure  19  illustrates  the  conceptual  process  by  which  the  NETCOP  will  be  distributed  to 
the  various  organizations  that  have  a  need  for  this  information. 

E.  SUMMARY 

This  chapter  provides  a  high-level  overview  of  how  the  Army  is  attempting  to 
establish  an  enterprise  network  management  operation  to  manage  and  control  the  AEI. 
Hopefully  the  reader  gained  a  sense  of  the  magnitude  of  this  monumental  endeavor. 
There  are  tens  of  thousands  of  devices  that  the  NETCOM  will  have  to  control  as  an 
enterprise.  A  recent  C-TNOSC  briefing  entitled  “CONUS  NETOPS  Status  and  Issues” 
[48]  identifies  some  of  the  challenges.  NETCOM  only  has  limited  control  of  network 
devices:  “direct  monitoring  of  devices  is  limited  to  those  devices  directly  managed  by  the 
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C-TNOSC...”  Additionally,  the  Army  NETCOP,  which  has  been  online  for  a  year,  relies 
on  network  management  information,  such  as  network  outages,  to  be  pushed  from  the 
bottom  up  instead  of  being  controlled  at  the  enterprise.  This  is  tied  to  the  limited  number 
of  devices  that  are  directly  managed.  The  presentation  also  points  out  the  huge  level  of 
effort  involved  in  managing  at  the  enterprise  level  (see  Figure  20). 
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Figure  20.  A  day  in  the  CTNOSC  [48] 


Figure  20  depicts  a  typical  day  monitoring  the  AEI  in  the  C-TNOSC.  The  C- 
TNOSC  is  inundated  with  up  to  5  million  events  daily  that  have  to  be  filtered.  Once 
filtered,  trouble  tickets  must  be  assigned  for  corrective  action  and  reported.  This  is  a 
highly  intensive  and  costly  operation  that  requires  a  number  of  people,  equipment, 
processing  power,  and  bandwidth. 

The  labor-intensive  operation  as  described  in  the  previous  paragraph  exemplifies 
the  insufficiency  of  the  traditional  protocols  such  as  SNMP.  For  instance,  SNMP  simply 
gathers  the  data  and  reports  it  for  handling  by  human  operators.  While  it  does  off-load 
some  of  the  computation  load,  it  has  no  capability  to  off-load  the  four  step  event  process 
described  above.  The  idea  of  intelligent-agent-based  network  management  is  to 
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significantly  reduce  the  intensity  and  cost  by  allowing  intelligent  agents  to  carryout  many 
of  these  functions  in  a  more  effective  and  efficient  manner. 
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VI.  CONCEPTUAL  APPROACH 


This  chapter  presents  a  conceptual  intelligent-agent-based  architecture  to 
emphasize  the  potential  of  intelligent-agent-based  technology  for  Anny  enterprise 
network  management.  The  first  part  gives  an  overview  of  the  Control  of  Agent-Based 
Systems  (CoABS)  multi-agent  system  that  serves  at  the  infrastructure  for  the  conceptual 
architecture.  The  second  part  of  the  chapter  describes  the  details  of  the  conceptual 
architecture. 

A.  CONTROL  OF  AGENT-BASED  SYSTEMS 

Control  of  Agent-Based  Systems  (CoABS)  [51]  is  a  research  program  of  the  US 
Defense  Advanced  Research  Projects  Agency  and  the  US  Air  Force  Rome  Labs.  The 
program  is  aimed  at  developing  and  demonstrating  techniques  to  safely  control, 
coordinate  and  manage  large  systems  of  autonomous  software  agents.  The  program 
investigates  the  use  of  agent  technology  to  improve  military  command,  control, 
communication,  and  intelligence  gathering  by  enhancing  the  dynamic  connection  and 
operation  of  military  planning,  command,  execution,  and  combat  support  systems  to 
quickly  respond  to  a  changing  operational  picture. 

Over  twenty  universities  and  companies  are  participating  in  the  CoABS  research 
effort.  Each  participating  organization  brings  to  the  program  its  own  agent  architecture, 
with  different  agent  communication  languages,  ontologies,  and  agent-based  services. 
Some  of  the  organizations  and  architectures  involved  include  RESTINA  agents  from 
Carnegie  Mellon  Univesity,  TEAMCORE  agents  from  the  University  of  Southern 
California  Information  Science  Institute,  D ’Agents  from  Dartmouth  College,  EMAA 
from  Lockheed  Martin,  and  Nomads  from  the  University  of  West  Florida.  CoABS  is  a  six 
year  DARPA  program  that  will  end  December  2003. 

1.  CoABS  Background 

The  CoABS  program  was  initiated  to  address  many  of  the  challenges  within 
today’s  military  environment.  The  CoABS  perspective  of  the  military  environment  is 
viewed  as  very  dynamic,  with  operations  changing  quickly,  hardware  and  software 
moving,  connecting,  and  disconnecting,  and  network  bandwidth  availability  varying 
greatly.  There  are  essential,  but  inflexible,  stove-piped  legacy  systems  that  need  to  be 
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integrated.  There  is  information  overload,  with  vastly  increased  data  available,  but 
inadequate  tools  to  filter  the  data.  Lastly,  the  military  enviromnent  is  heterogeneous  with 
multiple  standards  and  interfaces,  as  well  as  multiple  hardware  and  software  platforms 
[49]. 

To  resolve  the  inherent  complexities  of  within  the  military  environment,  the 
CoABS  visionaries  realized  that  the  military  needed  a  customized  software  technology 
solution  that  could  be  rapidly  developed  at  a  low  cost  and  execute  on  readily  available 
hardware  without  overloading  conventional  processors.  This  need  is  what  triggered  the 
CoABS  concept  -  leverage  previous  research  in  distributed  artificial  intelligence  to 
develop  intelligent  agents  that  will  handle  the  complexities  more  efficiently  and 
effectively. 

The  CoABS  program  also  identified  the  need  for  cooperation  among  the 
heterogeneous  agents  produced  by  different  developers.  Cooperation  among  agents  is 
critical  to  building  powerful  applications  to  support  military  capability.  Without 
cooperation,  monolithic  agents  would  have  to  handle  each  new  task.  Control  strategies 
are  needed  to  build  small  teams  of  agents  that  can  cooperate  in  a  robust  and  flexible 
manner,  as  well  as  a  very  large  number  of  agents  that  exhibit  macro  scale  behavior 
without  attending  to  the  detailed  behavior  of  individual  agents.  Furthermore,  there  are  no 
sufficient  algorithms,  policies,  or  mechanisms  that  prevent  a  large  heterogeneous  set  of 
agents  from  exhibiting  dangerous  or  chaotic  behavior  on  a  network.  This  lack  of  control 
could  lead  to  clogged  networks,  wasted  resources,  poor  performance,  system  shutdowns, 
and  security  vulnerabilities. 

The  CoABS  program  setout  to  develop  technologies  for  the  control  of  multi-agent 
systems  with  predictable  behavior  for  automating  military  command  and  control  in  a 
cost-effective  manner.  If  successful,  the  systems  of  cooperating  agents  and  agent 
ensembles  are  expected  to  dramatically  reduce  the  information  systems  workload  for  the 
entire  spectrum  of  military  forces  from  the  national  command  authority  down  to  the 
small-unit  level  as  well  as  provide  a  framework  for  resource  management  in  a  dynamic 
hostile  or  unpredictable  environment  in  which  software  systems  are  adaptable,  self- 
configuring,  self-healing  and  evolvable. 
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Specifically,  CoABS  proposes  to  develop: 


•  A  simple  agent  programming  methodology  supported  by  sophisticated 
component  libraries  that  can  automate  complex  functions  cheaply  and 
easily  with  agents  assembled  from  powerful  pieces. 

•  Compatible  agent  behavior  models. 

•  Interoperable  agent  communication  languages. 

•  Advanced,  fully  protective  agent  services  for  protecting  both  agents  and 
hosts/current  servers/existing  data  sources. 

•  Simple  methods  of  understanding  agent  behavior. 

2.  Agent  Grids 

A  grid  is  fundamentally  a  mechanism/infrastructure  that  helps  integrate  resources. 
For  example  a  power  grid  or  transportation  grid  both  supply  an  enabling  capability 
(electrical  power,  transportation  of  goods  and  services)  that  fulfill  the  infrastructure  needs 
of  diverse  businesses.  In  a  computer-related  grid,  such  resources  exist  at  multiple 
technical  levels,  and  hence  grids  can  exist  at  these  same  levels.  Brian  Kettler  [50] 
explains  the  various  types  of  grids: 

1.  Computational  grids  integrate  distributed  computing  resources  to 
address  supercomputing,  collaborative  computing,  and  on-demand 
computing  applications. 

2.  Data  grids  integrate  diverse  types  of  data  into  unified,  interrelated 
collections  that  support  complex  applications.  Current  databases 
illustrate  some  of  the  data  integration  facilities,  and  the  Web  illustrates 
some  of  the  diversity  and  scope,  required  in  such  grids. 

3.  Object  grids  extend  data  grids  with  associated  software  components, 
and  hence  provide  unified  access  to  both  data  and  the  resources 
necessary  to  operate  on  that  data.  Distributed  object  systems  and  the 
Web  illustrate  some  of  the  aspects  required  in  such  object  grids. 

4.  Agent  grids  can  be  thought  of  as  an  extension  of  object  grids  with 
“smarter”  software.  The  CoABS  grid,  explained  later,  is  example  of  an 
agent  grid. 


Kettler  further  explains  that  in  addition  to  the  need  for  the  individual  technical 
grid  levels  described  above,  these  individual  grid  levels  need  to  be  unified.  An  agent- 


77 


level  grid  supporting  this  requirement  should  provide  both  grid  capabilities  at  the 
computation  and  data/object  levels  in  support  of  agents,  as  well  as  grid  capabilities  at 
these  other  levels  enabled  by  agents.  Both  these  types  of  support  are  important  in  making 
the  maximum  use  of  agent-level  capabilities.  For  example,  agent-level  grids  can  take 
advantage  of  the  capabilities  of  underlying  computational  grids  in  supporting  their  load 
balancing  and  quality-of-service  requirements  (particularly  where  the  higher-level  grids 
can  interact  directly  with  the  lower  levels  to  exert  control).  Operational  agent  grids  will 
also  need  to  interact  with  data  and  object  systems  because  much  information  and 
software  functionality  that  will  need  to  be  accessible  to  agent  grids  will  continue  to  exist 
in  these  systems.  The  CoABS  Grid  is  an  agent-level  grid  that  encompassed  these 
properties. 

3.  The  CoABS  Grid 

The  CoABS  Grid  [49]  is  a  framework  for  federating  heterogeneous  agent  systems 
designed  to  meet  the  challenges  of  the  military  environment,  as  well  as  address  the 
heterogeneity  among  the  participating  agent  research  communities.  The  CoABS  Grid  is 
an  adaptive  and  robust  collection  of  infrastructure,  services,  agents,  standards,  and 
protocols.  It  enables  the  run-time  integration  and  dynamic  interoperability  of  distributed 
agents,  objects,  devices,  and  legacy  systems.  “You  can  also  think  of  the  Grid  as  this 
infrastructure  layer  and  all  the  agents  and  services  running  on  it.“[51]  Although  the 
CoABS  Grid  is  being  developed  for  military  application,  it  is  a  general-purpose  agent 
framework  with  potential  use  by  a  wide  variety  of  applications,  such  as  enterprise 
network  management. 

The  CoABS  Grid  supports  dynamic  registration  and  discovery  of  relevant 
participants  and  flexible  run-time  communications.  It  includes  a  method-based 
application  programming  interface  to  register  agents,  advertise  their  capabilities,  discover 
agents  based  on  their  capabilities,  and  send  messages  between  agents.  Agents  can  be 
added  and  upgraded  without  reconfiguring  the  network.  Failed  or  unavailable  agents  are 
automatically  purged  from  the  registry. 

The  CoABS  Grid  has  some  unique  characteristics  that  distinguish  it  from  other 
agent  infrastructures.  The  CoABS  Grid  is  transport  neutral  in  terms  of  agent 

communication.  The  CoABS  Grid  defines  a  well-known  message  -delivery  interface. 
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Grid  agents  must  have  a  proxy  that  supports  the  message-delivery  interface,  but  the  agent 
is  free  to  use  any  transport  to  communicate  with  that  proxy.  In  order  to  send  a  message  to 
an  agent,  the  sender  makes  a  local  method  call  to  an  agent  proxy.  The  proxy  transfers  the 
message  to  the  actual  agent  using  a  protocol  private  to  the  agent  and  proxy.  If  the 
transport  mechanism  is  changed,  the  sender  code  is  not  affected.  The  CoABS  Grid 
provides  a  means  of  obtaining  agent  proxies  based  on  agent  characteristics  and  types. 

CoABS  Grid  communication  is  fully  distributed,  in  that  each  agent  sending  a 
message  communicates  directly  with  the  receiver,  using  the  proxy  registered  by  the 
receiver.  That  is,  all  agent-to-agent  communication  is  point  -to-point.  Thus,  as  the 
number  of  agents  on  the  CoABS  Grid  increases,  agent  communication  performance  is 
only  affected  by  the  distribution  of  the  agents  in  the  network,  the  network  bandwidth,  and 
the  details  of  the  particular  proxy  implementation. 

4.  Elements  of  the  Grid 

At  an  abstract  level,  the  CoABS  Grid  consists  of  four  main  elements: 
Applications,  Grid-aware  components,  Grid  services,  and  the  Grid  infrastructure.  Figure 
21  depicts  the  relationship  between  these  elements.  Applications  are  shown  as  red  ovals 
(.e.g.,  the  Master  Battle  Planner).  Most  of  these  are  legacy  applications  but  some  could 
be  comprised  of  agents.  Each  application  has  a  proxy  agent  that  uses  what  is  known  as  a 
GridServiceHelper  (Grid  services  are  green  rectangle)  module  to  interoperate  with  other 
components  on  the  Grid  and  Grid  core  services,  shown  in  the  center  of  the  diagram. 
Yellow  ovals  represent  various  data  sources  that  are  accessed  by  the  applications  to 
which  they  are  linked.  Data  is  obtained  from  a  variety  of  collection  mechanisms,  shown 
with  military  equipment  icons.  Components  that  can  be  connected  to  the  Grid  (typically 
via  a  GridServiceHelper  or  GridAgentHelper)  are  called  Grid-aware  components  (GAC). 
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Figure  2 1 .  Sample  Grid-enabled  Application  for  Coalition  Military  Operations  [50]. 


Grid  services  (green  boxes  in  Fig.  21)  provide  functionality  that  enables  Grid- 
aware  components  to  interoperate  and  form  applications  (e.g.,  component  discovery, 
brokering,  translation,  etc.).  They  also  help  ensure  the  smooth  operation  of  applications 
by  providing  facilities  for  instrumenting,  testing,  debugging,  visualizing,  and  managing 
applications.  Grid-aware  components  can  access  Grid  services  by  access  mechanisms 
such  as  protocols,  wrappers,  et  cetera. 

The  Grid  infrastructure  is  comprised  of  “lower-level”  infrastructures  (possibly 
layered  themselves)  and  mechanisms  (protocols,  wrappers,  etc)  that  are  used  by  (1)  Grid 
services  to  talk  to  one  another  and  (2)  by  components  directly  to  talk  to  other  components 
and  to  access  Grid  services.  The  Grid  Infrastructure  leverages  other  infrastructures  that 
provide  specific  connectivity  (and  services)  among  components  of  the  same  kind. 
However,  the  Grid  Infrastructure  does  not  necessarily  replace  the  other  infrastructures 
because  these  architectures  are  often  optimized  for  the  kinds  of  components  they 
integrate.  At  the  lowest  level,  there  is  the  network  infrastructure  (network  software, 
protocols,  and  network  hardware)  such  as  the  Internet  -  or  possibly  a  private  intranet. 


This  provides  the  low-level  transport  mechanisms  between  distributed  machines  [50]. 
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5.  CoABS  Grid  Implementation  [50] 

This  section  provides  an  overview  of  how  the  CoABS  grid  is  implemented, 
describing  Grid  operations  and  component  interactions.  The  details  within  this  section 
were  extracted  from  various  the  CoABS  technical  papers  found  at  the  CoABS  website 
[49]  [50]  [51].  Figure  22  below  illustrates  that  CoABS  Grid  architecture  that  is  described 
in  this  section. 


Figure  22.  CoABS  Grid  Architecture  [50]. 


The  CoABS  Grid  is  built  using  the  Jini™  Connection  technology  developed  by 
Sun  Microsystems.  The  robust  and  dynamic  nature  of  the  CoABS  Grid  is  derived  from 
Jini™.  The  CoABS  Grid  software  is  written  in  Java  and  also  uses  Java  Remote  Method 
Invocation  (RMI)  for  inter-agent  communication.  Members  of  the  CoABS  research 
community  have  created  proxies  that  integrate  the  CoABS  Grid  with  agent  systems 
written  in  C++  and  Lisp  and  for  the  Palm  Pilot  KVM.  The  CoABS  Grid  takes  advantage 
of  three  important  components  of  Jini™: 

1 .  the  Jini™  concept  of  a  service,  which  is  used  to  represent  an  agent, 

2.  the  Jini™  Lookup  Service  (LUS),  which  is  used  to  register  and 
discover  agents  and  other  services,  and 

3.  Jini™  Entries,  which  are  used  to  advertise  an  agent’s  capabilities. 
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A  Jini™  service  is  a  Java  object  that  is  serialized  and  stored  in  the  LUS.  The  LUS 
supports  lookup  of  services  based  on  type,  attribute  values,  and  unique  identifier.  When  a 
Jini™  client  performs  a  lookup  through  the  LUS,  the  service  object  is  returned  to  the 
client.  The  service  may  optionally  be  a  proxy  that  uses  a  remote  connection  to 
communicate  back  to  the  true  service  at  a  different  location.  The  remote  connection  is 
transparent  to  the  client  and  can  be  of  any  type,  e.g.  RMI,  CORBA,  or  secure  socket. 

The  LUS  grants  leases  to  registered  services,  assigns  globally  unique  identifiers  to 
services,  and  supports  lookup  of  services.  It  is  the  service’s  responsibility  to  maintain  its 
lease  with  the  LUS,  however  Jini™  provides  helper  classes  to  do  this  automatically.  If  a 
service  cannot  maintain  its  lease  because  of  either  failure  of  the  service  or  failure  of  the 
network  connection  between  the  service  and  the  LUS,  the  service  will  be  purged  from  the 
LUS,  so  that  the  LUS  contents  remain  current. 

Jini™  provides  helper  classes  that  use  a  multicast  protocol  to  find  any  LUSs  that 
are  running  within  a  local  area  network.  No  prior  knowledge  of  the  machine  name  or  port 
that  the  LUS  is  running  on  is  required.  Jini™  provides  a  unicast  protocol  to  find  LUSs 
outside  the  local  area  network.  Service  registration  is  maintained  in  all  local  and  distant 
LUSs.  The  registration  is  automatically  propagated  to  any  new  LUS  processes  that  are 
started.  Multiple  LUSs  can  be  run  for  robustness  and  scalability.  If  one  goes  down,  the 
others  will  still  maintain  registration  and  lookup. 

Jini™  services  are  described  in  the  form  of  a  Jini™  Entry.  An  Entry  is  a 
collection  of  service  attributes  that  is  stored  in  the  LUS  along  with  the  service.  Many 
Entries  can  be  stored  for  a  single  service.  Entry  templates  are  used  in  Jini™  and  CoABS 
Grid  lookup  methods  to  match  registered  services.  Entry  templates  and  service  types  can 
be  used  to  filter  the  number  of  services  that  are  downloaded  from  a  LUS  over  the 
network. 

The  CoABS  Grid  uses  Jini™  Entries  for  agent  capability  advertisements.  The 
CoABSAgentDescription  Entry  has  fields  for  agent  name,  description,  organization, 
architecture,  ontologies,  content  languages,  display  icon  URL,  documentation  URL,  and 
unique  ID. 
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The  CoABS  Grid  provides  helper  utility  classes  that  are  local  to  an  agent  and  that 
hide  the  complexity  of  Jini™.  These  classes  automatically  find  any  LUS  in  both  the  local 
area  network  and  user-designated  distant  machines.  The  CoABS  Grid  supports  agent  and 
service  discovery  based  on  Jini™  Entries  and  arbitrary  predicates  as  well  as  by  service 
type.  The  CoABS  Grid  also  provides  event  notification  when  agents  register,  deregister, 
or  change  their  advertised  attributes. 

The  CoABS  Grid  defines  a  Jini™  service  interface  called  the  AgentRep,  which  is 
a  proxy  to  the  agent.  This  interface  defines  a  method  called  addMessage(),  which  uses  a 
remote  connection  to  deliver  a  message  back  to  the  agent.  Thus,  when  a  client  agent  calls 
a  CoABS  Grid  lookup  method,  a  proxy  that  allows  immediate  direct  communication  back 
to  the  agent  is  returned.  The  client  agent  can  include  its  own  AgentRep  in  the  message  it 
delivers,  so  that  two-way  communication  can  be  established  with  no  further  lookup.  The 
CoABS  Grid  is  transport  neutral  in  terms  of  agent  communication.  The  CoABS  Grid 
defines  the  interface,  but  the  agent  proxy  is  free  to  use  any  transport  in  its 
implementation. 

The  CoABS  Grid  currently  provides  an  AgentRep  implementation  that  uses  RMI 
for  message  transport.  Other  transport  mechanisms  are  in  development.  An  AgentRep 
downloaded  to  a  client  is  connected  to  a  MessageQueue  object  local  to  the  agent  using 
RMI.  A  MessageListener  interface  is  also  defined  to  allow  agents  automatic  notification 
of  incoming  messages.  Several  classes  of  CoABS  Grid  messages  are  provided.  Some 
include  text  messages  only,  while  others  allow  data  attachments.  The  CoABS  Grid  is 
language  neutral;  any  agent  communication  language  can  be  used.  It  is  up  to  the 
communicating  agents  to  decipher  the  contents  of  a  message.  The  CoABS  Grid  also 
provides  methods  to  send  a  message  to  a  group  of  agents  matching  a  particular  template 
or  satisfying  a  particular  predicate. 

Agent  communication  is  fully  distributed,  in  that  each  agent  sending  a  message 
communicates  directly  with  the  receiver,  using  the  proxy  registered  by  the  receiver.  The 
sender  is  unaware  of  the  transport  mechanism  being  used,  although  currently  RMI  is  the 
default.  Thus,  as  the  number  of  agents  on  the  CoABS  Grid  increases,  agent 
communication  perfonnance  is  only  affected  by  the  distribution  of  the  agents  in  the 
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network,  the  network  bandwidth,  and  the  details  of  the  particular  AgentRep 
implementation.  For  this  reason,  this  portion  of  the  architecture  is  thought  to  be  highly 
scalable. 

6.  CoABS  Experimentation  and  Findings 

CoABS  has  undergone  and  is  undergoing  a  number  of  experiments  to  test  the 
capabilities  of  intelligent  agents  in  a  multi-agent  environment.  There  are  four  ongoing 
Technical  Integration  Experiments  (TIEs)  [51]  within  the  CoABS  program.  Each  TIE 
involves  a  number  of  researchers  who  have  come  together  to  solve  a  particular  problem 
by  combining  their  research  efforts.  The  Coalition  Agents  Experiment  (CoAX)  is 
addressing  the  unique  aspects  of  achieving  coherent  Coalition  operations  from  diverse 
“come-as-you-are”  elements.  The  Mixed-Initiative  Agent  Team  Administration  (MIATA) 
TIE  is  exploring  approaches  to  human/agent  coordination  in  large,  continuous  C2 
organizations.  The  Electric  Elves  TIE  is  investigating  the  use  of  teams  of  software  agents 
to  aid  humans  in  facilitating  an  organization’s  coherent  functioning  and  rapid  response  to 
crises,  while  reducing  the  burden  on  humans.  The  Mobility  TIE  is  conducting 
experiments  to  detennine  under  what  conditions  mobile  agents  should  be  deployed,  as 
well  as  building  CoABS  Grid  components  to  facilitate  the  movement  of  mobile  agents 
between  heterogeneous  mobile  agent  platforms. 

One  the  more  important  experiments  for  a  multi-agent  system,  as  well  as  for 
intelligent-agent-based  network  management,  is  scalability  of  the  infrastructure.  In 
August  2000,  the  CoABS  program  conducted  an  experiment  [49]  on  the  scalability  of  the 
CoABS  Grid  Infrastructure  with  respect  to  agent  lookup.  Basically,  the  experiment 
investigated  how  lookup  time’s  scaled  as  the  lookup  service  became  more  populated.  The 
experiments  were  designed  to  give  a  qualitative  understanding  of  whether  perfonnance 
problems  become  apparent  with  a  highly  populated  LUS. 

In  the  experiments,  500  agents  were  registered  at  a  time  with  the  CoABS  Grid, 

until  a  total  of  10,000  agents  were  registered.  Twenty-two  different  agent  queries  were 

performed  after  each  group  of  500  agents  were  registered,  measuring  how  long  it  took  to 

fetch  the  agents  and  the  number  of  agents  retrieved.  The  experiments  found  that 

sequential  lookup  scales  well  to  10,000  agents.  Lookups  that  retrieved  a  single  agent,  as 

well  as  lookups  that  matched  no  agents  were  not  affected  by  the  number  of  agents 
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registered.  Additionally,  time  to  lookup  multiple  agents  increased  proportionally  to  the 
number  of  agents  retrieved.  Finally,  time  to  lookup  multiple  agents  is  independent  of  the 
number  of  agents  registered.  Experiments  performed  by  an  independent  subcontractor 
also  found  constant  lookup  times  for  lookups  that  returned  120  agents  with  a  maximum 
of  600  agents  registered,  and  lookups  that  returned  250  agents  with  a  maximum  of  2500 
agents  registered.  Table  1  provides  a  summary  of  the  experiment  results. 


Table  1  Lookup  Experiment  Results  Summary  [49] 


Processor 

#  Agents  ((>■ 
This  Machiac) 

#  Agents 
((  'umulative) 

Least  lime 

la  Look  lip 
1  Ajrnl 
(■is.) 

Least  lime 
to  Look  Up 
#  Agents 
(ms.) 

Tunc  to 
Lookl'p 
Blue 
Accnls 
<■».) 

Number 

III  lu- 
Agents 
Found 

T  ime  to  Look  I'p 
Blue  French 
.\gcats”  (ms.) 

Number 
Hlue/F  reach 
Agents  Found  " 

Pill  733  MHz 

Jini141  LUS 

N/A 

N/A 

N/A 

N/A 

N/A 

N/A 

N/A 

Pill  600  MHz 

Lookup  Testa 

N:A 

N/A 

N/A 

N/A 

N/A 

N/A 

N/A 

Pill  400  MHz 

500 

500 

90 

10 

771 

39 

200 

6 

Pll  500  MHz 

500 

1.000 

50 

10 

1543 

77 

310 

n 

PHI  400  MHz 

500 

1400 

80 

10 

1863 

116 

471 

17 

PHI  733  MHz 

500 

2.000 

70 

10 

2664 

154 

511 

22 

PHI  733  MHz 

500 

2.500 

80 

10 

2774 

193 

661 

28 

PHI  500  MHz 

500 

1.000 

70 

10 

3104 

231 

801 

33 

PII  300  MHz 

500 

1.500 

70 

10 

4446 

270 

831 

39 

PHI  866  MHz 

500 

4.000 

70 

10 

4417 

308 

1002 

44 

PI1 450  MHz 

500 

4.500 

80 

10 

4947 

347 

1171 

50 

PHI  450  MHz 

500 

5.000 

70 

10 

4917 

385 

1082 

55 

PH  400  MHz 

500 

5.500 

70 

10 

5828 

4221424) 

1241 

61 

Pll  450  MHz 

500 

6.000 

70 

10 

5848 

462 

1302 

66 

PH  213  MHz 

500 

6.500 

70 

10 

7361 

498(501) 

1502 

72 

PHI  733  MHz 

500 

7,000 

70 

10 

7191 

539 

1402 

77 

PHI  650  MHz 

500 

7300 

70 

10 

SI42 

577(578) 

1662 

83 

PHI  550  MHz 

500 

8.000 

70 

10 

7681 

613(616) 

1602 

87(88) 

PHI  400  MHz 

500 

8300 

70 

10 

9393 

653(655) 

1943 

93(94) 

PHI  400  MHz 

500 

9.000 

70 

10 

11377 

692(693) 

5047 

98(99) 

PHI  677  MHz 

500 

9300 

70 

10 

9604 

730(732) 

21 13 

104(105) 

Pill  733  MHz 

500 

10,000 

70 

N/A 

9884 

769(770) 

2123 

109(110) 

*  Twenty  separate  lookup  tests:  looked  up  agents  with  size  =  0, 500,  1000,  1500, . 10,000  -  zero  or  one  agent  of  each  size 


**  One  attribute  lookup  test:  find  all  agents  with  TestEntry. color  =  Color.bluc 

***  Two  attribute  lookup  test:  find  all  agents  with  TestEntry.color  =  Color.bluc  and  TestEntry.language  =  "French" 

Figure  23  shows  that  lookup  performance  of  the  LUS  appeared  to  degrade  when 
looking  up  multiple  agents  as  the  LUS  was  populated  with  up  to  10,000  agents. 
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Number  of  Agents  Registered  (Cumulative) 


Figure  23.  Looking  Up  Multiple  (Blue)  Agents  [49] 

Figure  24  shows  a  normalized  graph  that  indicates  that  the  lookup  of  multiple 
agents  is  roughly  proportional  to  the  number  of  agents  retrieved. 


Figure  24.  Normalized  Looking  Up  Multiple  (Blue)  Agents  [49] 


In  April  2001,  a  second  set  of  experiments  were  run  with  a  newer  version  of  the 
Grid  code.  These  experiments  repeated  the  lookup  experiments  done  previously. 
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Measurements  of  how  long  it  took  to  register  the  agents,  as  the  lookup  service  became 
more  populated  were  also  collected.  Figure  25  shows  the  registration  time  results  [51]: 


Registration  Time  as  Total  Agents  Registered  Increases 
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Figure  25.  Results  from  April  2001  experiment  [51] 

B.  CONCEPTUAL  APPLICATION  FOR  ENTERPRISE  NETWORK 
MANAGEMENT  IN  THE  ARMY 

Having  discussed  the  requirements  and  issues  of  enterprise  network  management 
in  the  Army,  and  looking  at  the  many  capabilities  that  intelligent  agent  technology  and 
multi-agent  systems  brings,  this  section  discusses  the  conceptual  application  of  these 
technologies  for  the  Army  Enterprise  Infostructure  initiative.  This  section  describes  a 
high-level  enterprise  network  management  architecture  by  integrating  the  concepts  of  an 
intelligent-agent-based  architecture  [10],  designed  by  SRI  International,  on  top  of  the 
CoABS  multi-agent  environment.  The  idea  here  is  just  to  discuss  an  approach  that 
emphasizes  the  potential  of  these  technologies  to  enhance  the  Anny’s  efforts.  This 
architecture  is  by  no  means  the  best  approach  or  optimal  intelligent-agent-based  solution; 
there  are  many  other  ways  to  exploit  these  and  other  agent-based  technologies. 

While  CoABS  has  shown  promising  results  for  the  interoperability  of  command 

and  controls  systems  within  in  the  military,  this  technology  has  not  been  applied  to  the 

NM  domain.  CoABS  is  ideal  for  enhancing  NM  capabilities.  As  discussed  in  the  previous 

section,  the  CoABS  Grid  demonstrates  the  rich  features  (such  as  scalability,  reliability, 

flexibility,  and  adaptability)  that  are  necessary  for  enterprise  network  management  in  the 
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Army.  The  Grid  is  capable  of  internetworking  heterogeneous  intelligent  agent  NM 
platforms  in  a  fashion  analogous  to  routers  internetworking  heterogeneous  local  area 
networks.  Such  an  environment  allows  the  enterprise  to  monitor  and  control  the  network 
in  an  efficient,  effective,  and  low-cost  manner. 

1.  The  Intelligent-Agent-Based  Enterprise  Management  Architecture 

The  agent  architecture  described  in  this  section  is  integrated  on  top  of  CoABS  to 
allow  for  agent  communication,  functionality,  and  services,  and  for  central  enterprise 
management  and  control  of  the  network.  It  is  based  on  a  federated  and  hierarchical  design 
where  agents  communicate,  coordinate,  and  collaborate  over  the  CoABS  infrastructure, 
but  implements  control  of  the  network  nodes  in  a  hierarchical  manner.  CoABS  facilitates 
interoperability  of  legacy  systems,  proprietary  agent  environments  that  are  Grid-aware, 
and  other  disparate  systems  that  are  Grid-aware.  The  architecture  also  provides  the 
necessary  capability  to  aggregate  NM  situational  awareness  information  into  the 
enterprise  NM  common  operational  picture  (see  Fig.  26). 


Entemrise  manaaement 


Legacy  Device 


Figure  26.  Intelligent-Agent-Based  High-level  Architecture 
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The  architectural  approach  [10]  is  based  on  an  infrastructure  in  which  data-driven 
processes  (i.e.,  intelligent  local  agents)  can  analyze  network  conditions  and  post  the 
results  to  a  shared  memory  structure.  Each  of  the  agents  represents  a  different 
technology,  and  is  activated  upon  receiving  the  appropriate  data  or  information.  The 
advantage  of  this  approach  is  that  when  a  new  method  of  analyzing  data  is  developed, 
that  method  can  be  incorporated  into  the  architecture  as  a  new  agent.  The  adjustments  are 
then  limited  to  registering  that  agent  and  giving  the  agent  the  ability  to  produce  and 
evaluate  agent-to-agent  messages.  Thus,  the  overhead  for  changes  is  much  less  than  with 
traditional  approaches. 

As  shown  in  Figure  27,  a  set  of  localized  agents  are  embedded  within  each 
managed  network  node  to  perform  specific  management  tasks.  Separate  agents  exist  to 
carryout  tasks  for  a  specific  NM  function:  fault,  configuration,  accounting,  performance, 
or  security  management.  Only  perfonnance  and  fault  management  are  addressed  in  this 
architecture,  although  the  model  can  incorporate  others.  With  this  design,  intelligent  NM 
functionality  is  pushed  out  to  the  devices  where  autonomous  troubleshooting, 
computational  processing,  and  repair  can  occur  dynamically.  Agents  perform  their  NM 
functions  within  a  network  node  by  analyzing  data  as  it  arrives  and  posting  results  or 
conclusions  to  a  shared  memory.  While  it  is  processing  data,  each  agent  has  access  to  the 
shared  memory  and  the  conclusions  of  other  agents. 
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Figure  27.  The  Intelligent-Agent-Based  Network  Node  Architecture 


The  agent  architecture  consists  of  the  following  four  functional  intelligent  agents 
that  perform  very  different  tasks  (see  Fig.  27)  [10]: 

•  Control  Agent  -  uses  symbolic  and  analogical  reasoning  to  synthesize 
results  from  the  other  (local  and  peer-to-peer)  agents,  additional  network 
information,  and  information  from  other  nodes,  and  to  make  decisions 
based  on  QoS  considerations. 

•  Fault-Diagnostic  Agent  -  uses  rules  to  isolate  faults  in  the  network.  This 
agent  usually  works  over  a  longer  timeframe  than  other  agents.  The 
incoming  data  typically  identifies  deviant  network  behavior  (originating 
from  the  network)  or  potential  hot  spot  areas  (originating  from  other 
agents),  and  the  problem-solving  process  of  fault  isolation  and  recovery 
may  not  occur  in  real  time. 

•  Monitoring  Agent  -  uses  statistical  and  algorithmic  approaches  to  analyze 
information  from  all  levels  of  network  performance,  in  real  time.  The 
Monitoring  Agent  is  interested  in  collecting  situational  awareness  data 
originating  from  the  network  and  in  real-time  analyses  of  that  data  (e.g., 
congestion  detection). 

•  Problem  Discovery  Agent  -  uses  rules  and  algorithms  to  isolate 
geographic  problem  areas  in  the  network  that  need  further  investigation,  in 
a  short  timeframe.  The  Problem  Discovery  Agent  is  interested  in 
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discovering  problem  areas  in  a  short  time  frame  and  in  avoiding  transient 
problems.  The  infonnation  analyzed  by  this  agent  comes  directly  from  the 
network. 

The  Control  Agent  serves  as  the  decision  maker,  ensuring  that  the  local  agents 
work  together  towards  a  common  goal.  It  synthesizes  results  and  data  reported  by  the 
other  local  agents  and  the  infonnation  passed  on  by  other  nearest-neighbor  Control 
Agents  and  then  makes  control  decisions  to  optimize  the  QoS  of  the  network.  The 
Control  Agent  thus  takes  a  local  view  within  each  node,  but  also  considers  the  more 
global  view  of  the  network  through  communication  with  its  nearest  neighbor  nodes.  The 
Control  Agent  correlates  the  results  of  all  the  local  agents,  receives  requests  from  the 
local  agents  for  different  data  or  monitoring  needs,  requests  data  from  other  Control 
Agents  of  other  nodes  (i.e.,  peer-to-peer  or  global  communication),  and  makes  intelligent 
control  decisions  for  optimizing  the  network  performance. 

Within  each  local  node,  the  agents  are  not  aware  of  the  other  agents  in  the  node. 
They  are  only  aware  of  the  data  that  is  available  in  the  shared  memory  and  the  data 
passes  to  them  by  the  Control  Agent.  This  design  approach  was  implemented  to  reduce 
the  overhead  within  the  nodes  created  by  agent-to-agent  communication  messages. 

The  Control  Agent  can  perform  what  is  known  as  “elastic  monitoring,”  [10] 
where  the  monitoring  of  the  network  is  varied  according  to  the  current  network  state  (i.e., 
congestion  and  other  factors).  In  this  manner,  care  is  taken  to  avoid  monitoring  at  levels 
that  would  exacerbate  any  network  problems,  as  is  the  case  with  traditional  models. 

2.  Knowledgebase 

Instead  of  storing  management  infonnation  within  the  traditional  management 
information  base  (MIB)  as  in  the  SNMP  model,  this  architecture  incorporates 
semantically  designed  knowledgebase’s.  The  advantage  of  a  knowledgebase  is  that  they 
give  agents  the  capability  to  reason  and  infer  for  intelligent  decision-making.  Extensible 
Mark-up  Language-based  (XML-based)  technologies  can  be  used  to  structure  the 
knowledgebase,  thus  making  it  semantically  enabled  for  agents  to  understand.  For 
instance,  the  knowledgebase  structure  can  be  implemented  with  the  OWL  Web  Ontology 
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Language  [52]  that  creates  a  marked-up  data  structure  schema  where  agents  can 
understand  and  process  the  content  of  information  and  reason  and  infer  based  on  the 
ontology  structural  relationships. 

For  legacy  systems  that  are  incompatible,  XML-based  technologies  can  be  used  to 
wrap  the  database  and  extract  the  contextual  management  information.  The  contextual 
information  is  then  stored  in  the  knowledgebase. 

The  knowledgebase  also  incorporates  the  shared  memory  where  the  various 
agents  deposit  critical  information  for  future  reference.  This  provides  a  resource  for 
agents  to  learn  from  past  experience.  For  example,  the  Fault  Diagnostic  agent  can  query 
the  knowledgebase  to  determine  recovery  procedures  captured  in  the  shared  memory. 

The  knowledgebase  allows  the  Control  Agent  to  leam  from  past  experience, 
reason,  and  determine  the  best  control  action  for  the  network  autonomously. 
Additionally,  the  Control  Agent  correlates  the  analyses  posted  in  the  shared  memory  by 
the  other  agents.  This  capability  tremendously  improves  network  management,  especially 
at  the  enterprise-level  where  a  large  part  of  the  burden  is  now  distributed  across  the 
network.  Of  course,  the  human  administrator  sets  and  controls  the  degree  of  autonomous 
agent  actions. 

3.  Integration  with  CoABS  Grid 

For  integration  of  the  agent  architecture  on  to  the  CoABS  Grid,  the  CoABS 
middleware  application  is  loaded  on  the  network  nodes.  This  enables  the  local  device 
agents  to  be  Grid-aware  and  interface  with  the  CoABS  Grid  services.  This  allows  the 
agents  to  communicate  and  collaborate  over  the  Grid.  Messages  and  alarms  are 
forwarded  to  the  enterprise  management  entity  over  the  Grid.  Additionally,  the  enterprise 
entity  can  launch  agents  over  the  Grid  query  selected  devices  on  an  as  needed  basis. 

4.  Network  COP 

Situational  awareness  of  the  network  is  captured  by  the  Monitoring  Agent  on  each 
of  the  nodes.  The  nodes  only  collect  and  store  situational  awareness  information  that  is 
pertinent  to  itself.  The  Control  Agents  at  the  nodes  retrieve  situational  awareness 
information  and  reports  it  to  the  enterprise  NETCOP  as  events  occur.  Situational 
awareness  information  is  stored  in  a  NETCOP  knowledgebase  where  the  Monitoring 
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Agent  in  the  management  station  aggregates  the  situational  awareness  information  and 
displays  it  for  administrator  review.  The  Control  Agent  in  the  management  station  can 
analyze  infonnation  in  the  NETCOP  knowledgebase  and  provide  recommendations  for 
decision  support. 

5.  Example 

Here  is  a  short  example  that  sums  up  the  functionality  of  the  agent  architecture 

[10]: 


Suppose  that  the  Monitoring  Agent  has  detected  and  reported  congestion 
in  a  general  area  of  the  network.  This  is  reported  to  the  Control  Agent, 
which  then  retrieves  cases  analogous  to  the  problem  description.  On  the 
analogy  of  one  such  case,  the  Control  Agent  decides  to  ask  the  Problem 
Detection  Agent  to  locate  the  problem  area  to  a  finer  granularity,  and  to 
ask  the  Monitoring  Agent  to  increase  monitoring  in  that  area  of  the 
network  without  substantially  increasing  the  load  on  the  network.  The 
Control  Agent  also  asks  its  nearest  neighbor  nodes  if  they  are  seeing 
trouble  in  the  same  area  and  if  they  have  come  to  any  conclusions  as  to  the 
cause.  In  the  meantime,  the  Problem  Discovery  Agent  has  narrowed  down 
the  problem  to  an  area  of  the  network  that  is  not  responding.  This  implies 
a  fault,  so  the  Control  Agent  issues  a  request  to  the  Fault  Diagnostic  Agent 
to  isolate  the  fault.  The  Fault  Diagnostic  Agent  uses  the  shared  memory 
data  and  issues  requests  for  additional  monitoring  if  needed,  while 
performing  its  fault  isolation  activities.  Once  the  fault  is  isolated,  the  Fault 
Diagnostic  Agent  posts  it  on  the  shared  memory  and  announces  the  event 
to  the  Control  Agent.  The  Control  Agent  then  notifies  its  nearest  neighbor 
nodes  of  its  conclusions. 

6.  Closing  Comments 

Relative  to  the  traditional  models,  the  implications  of  the  design  described  above 
are  increased  reliability  because  problems  are  isolated  at  the  device  and  agent  action  is  no 
longer  at  the  mercy  of  the  network,  improved  adaptability  (and  reliability)  because  the 
agents  have  the  intelligence  to  respond  to  changing  conditions,  and  increased  flexibility 
because  the  agents  are  modular  and  have  specific  independent  responsibilities.  Network 
management  now  becomes  fully  distributed  while  still  maintaining  central  control  and 
management  of  the  entire  network  from  a  central  location.  These  properties  are  also 
inherent  in  the  CoABS  Grid.  The  CoABS  platfonn  provides  scalability,  which  is  essential 
for  the  AEI.  CoABS  provides  this  scalability  without  constraining  network  resources  as 
with  the  traditional  protocols. 
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VII.  CONCLUSION  AND  RECOMMENDATION  FOR  FUTURE 

RESEARCH 


A.  CONCLUSION 

This  study  concludes  that  in  theory,  intelligent-agent-based  technologies  are  a 
leading  solution,  among  other  current  technologies,  to  achieve  the  Anny’s  enterprise 
network  management  goals  for  the  AEI.  The  research  brought  out  and  discussed  the 
many  shortcomings  of  the  traditional  protocols  such  as  SNMP  that  will  hinder  AEI 
implementation.  It  elaborated  on  the  robust  capabilities  of  intelligent  agents  and  multi¬ 
agents  systems  and  how  they  can  be  applied  to  mitigate  many  of  the  SNMP  shortfalls.  An 
argument  was  made  that  demonstrated  the  advantages  of  intelligent-agent-based  NM  over 
SNMP  and  other  protocols.  The  study  further  reviewed  the  DoD’s  overarching 
JV2010/2020  efforts  and  the  DoD’s  and  the  Army’s  approaches  to  establishing  enterprise 
NM  systems.  Finally,  the  study  presented  a  conceptual  architecture  that  showed  how 
intelligent-agent-based  technologies  can  be  applied  to  achieve  the  Army’s  AEI 
objectives. 

The  research  and  analysis  discussed  and  answered  the  research  questions  as 

follows: 

1.  What  alternative  technologies  can  scale  and  meet  the  Army  Enterprise 
Infostructure  (AEI)  network  management  and  situational  awareness  requirements? 

As  mentioned  throughout  this  paper,  intelligent-agent-based  technologies  offer 
the  scalability,  flexibility,  reliability,  and  adaptability  needed  for  the  AEI  and  situational 
awareness  requirements.  These  characteristics  were  illustrated  in  the  conceptual 
architecture  where  an  agent-based  model  was  placed  on  top  of  the  CoABS  Grid  to  create 
the  dynamic  environment.  Although  intelligent  agent  technologies  is  a  long  way  off,  it  is 
an  alternative  that  the  Anny  should  study  for  future  use  as  the  JV20 10/2020  evolves. 

Sub-research  questions: 

a.  How  can  intelligent-agent-based  technologies  be  used  to  establish 
network  management  control  and  network  situational  awareness  of 
the  AEI? 
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As  discussed  in  Chapter  III,  intelligent  agents  can  assume  many  properties 
that  allow  them  to  dynamically  manage  and  control  a  network  environment. 
The  multi-agent  system  provides  the  infrastructure  where  agents  can 
communicate  and  collaborate  to  collectively  solve  NM  problems  as  needed. 
This  was  also  discussed  in  the  conceptual  architecture  where  intelligence  was 
pushed  out  to  the  managed  devices  where  NM  could  occur  autonomously. 

b.  How  can  intelligent-agent-based  technologies  be  used  to  execute  Fault, 
Configuration,  Accounting,  Performance,  and  Security  (FCAPS) 
management  or  establish  a  network  common  operational  picture 
(NETCOP)? 

The  was  depicted  in  the  conceptual  architecture  where  the  managed  devices 
consists  of  various  intelligent  agents  to  perform  FCAP  functions  and  report 
only  distilled  knowledge  to  the  enterprise  operations.  The  NETCOP  is 
aggregated  by  the  device  control  agents  reporting  to  the  enterprise  NETCOP 
knowledgebase  as  the  situation  dictates. 

c.  How  does  intelligent-agent-based  technology  for  enterprise  network 
management  compare  to  SNMP-based  and  other  distributed 
management  technologies? 

This  question  was  answered  in  Chapter  II  where  the  argument  for  intelligent 
agents  in  NM  was  made.  The  Lucent  experiments  demonstrated  that  an 
intelligent-agent-based  platform  was  more  dynamic,  could  handle  multiple 
standards,  overcame  interoperability  problems,  is  inherently  distributed,  and 
demonstrated  the  potential  for  much  greater  cost-savings  than  the  traditional 
protocols. 

d.  Can  an  intelligent-agent-based  network  management  architecture 
scale  to  support  the  AEI? 

The  CoABS  experiments  found  that  sequential  lookup  scales  well  to  10,000 
agents.  Lookups  that  retrieved  a  single  agent,  as  well  as  lookups  that  matched 
no  agents  were  not  affected  by  the  number  of  agents  registered.  Additionally, 


96 


time  to  lookup  multiple  agents  increased  proportionally  to  the  number  of 
agents  retrieved.  Finally,  time  to  lookup  multiple  agents  is  independent  of  the 
number  of  agents  registered. 

e.  How  can  the  Control  of  Agent  Based  Systems  (CoABS)  be  leveraged 
to  support  an  intelligent-agent-based  enterprise-level  network 
management  architecture  for  the  AEI? 

As  discussed  in  Chapter  VI,  the  CoABS  Grid  is  a  framework  for  federating 
heterogeneous  agent  systems  designed  to  meet  the  challenges  of  the  military 
environment,  as  well  as  address  the  heterogeneity  among  the  participating 
agent  research  communities.  The  CoABS  Grid  is  an  adaptive  and  robust 
collection  of  infrastructure,  services,  agents,  standards,  and  protocols.  It 
enables  the  run-time  integration  and  dynamic  interoperability  of  distributed 
agents,  objects,  devices,  and  legacy  systems.  As  explained  in  the  conceptual 
architecture,  CoABS  serves  as  the  infrastructure  that  allows  the  agents  to 
accomplish  their  goals. 

B.  RECOMMENDATION  FOR  FUTURE  RESEARCH 

This  study  gives  a  comprehensive  review  of  how  agent-based  technologies  can  be 
used  to  solve  the  enterprise  NM  issues.  However,  this  area  certainly  merits  much  more 
research  and  application.  The  Army  should  investigate  the  issues  and  the  potential  of 
intelligent-agent-based  technologies  with  respect  to  the  AEI  implementation.  A  technical 
solution  should  be  pursued  and  demonstrated  with  a  small-scale  proof-of-concept 
prototype.  The  technologies  in  this  study  are  just  some  of  the  readily  available  and 
accessible  technologies.  The  Army  should  conduct  an  all-out  research  effort,  on  and  off 
the  commercial  market,  to  find  the  most  optimal  and  mature  technologies  that  meet  the 
Army’s  requirements. 
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APPENDIX  I  NETWORK  MANAGEMENT  TECHNOLOGIES 


A.  INTRODUCTION 

To  make  the  argument  for  intelligent-agent-based  technologies,  it  is  important  to 
understand  the  functionality  of  the  current  technologies  in  order  to  assess  their 
limitations.  This  section  provides  an  overview  of  the  most  common  technologies  in  use 
today  with  emphasis  and  a  more  detailed  review  of  the  SNMP,  which  is  the  most 
prevalent. 

B.  SIMPLE  NETWORK  MANAGEMENT  PROTOCOL  (SNMP) 

Since  its  inception,  SNMP  evolved  in  three  different  versions  (SNMPvl, 
SNMPv2,  and  SNMPv3)  that  provide  increased  functionality  to  overcome  some  of  the 
design  inherent  weaknesses.  SNMPvl  was  too  simplified,  meaning  its  simplicity  and  the 
lightweight  nature  of  the  agent  only  allowed  device  status  report  and  update,  while  the 
burden  of  management  and  data  processing  resided  with  the  manager.  Version  1  severely 
also  lacked  security  to  protect  it  from  internet  sabotage.  SNMPv2  introduced  the  concept 
of  intennediary  manager  [29]  or  “middle  manager.”  The  intermediary  managers 
decentralized  management  responsibility  by  assuming  some  of  the  data  processing  from 
the  manager  side.  Additionally,  the  intennediaries  are  capable  of  performing  simple 
tasks.  SNMPv2  also  added  bulk  transfer  of  infonnation  capabilities  and  other  functional 
extensions,  but  still  lacked  the  necessary  security.  In  1998,  the  most  recent  version, 
SNMPv3,  was  issued.  SNMPv3  provides  the  much  needed  security  features  of 
authentication,  privacy  and  access  control.  It  also  provides  other  enhancements  such  as 
modularity,  which  allows  for  module  upgrades  without  needing  to  issue  a  new  entire 
standard. 

SNMP  is  based  on  three  concepts  [27]:  managers,  agents,  and  the  Management 
Information  Base  (MIB).  In  any  configuration,  at  least  one  manager  node,  called  the 
network  management  station  (NMS),  runs  SNMP  management  software.  Network 
devices  to  be  managed  are  commonly  called  network  nodes  or  network  elements  (NE). 
Devices  such  as  bridges,  routers,  servers,  and  workstations,  are  equipped  with  an  agent 
software  module.  The  agent  is  responsible  for  providing  access  to  a  local  MIB  of  objects 
that  reflects  the  resources  and  activity  at  its  node.  The  agent  also  responds  to  manager 
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commands  to  retrieve  values  from  the  MIB  and  to  set  values  in  the  MIB.  The  MIB  is  in 
essence  a  database  schema  for  storing  management  data,  which  are  called  managed 
objects.  An  example  of  an  object  that  can  be  retrieved  is  a  counter  that  keeps  track  of  the 
number  of  packets  sent  and  received  over  a  link  into  the  node;  the  manager  can  track  this 
value  to  monitor  the  load  at  that  point  in  the  network.  An  example  of  an  object  that  can 
be  set  is  one  that  represents  the  state  of  a  link;  the  manager  could  disable  the  link  by 
setting  the  value  of  the  corresponding  object  to  the  disabled  state.  Many  system  software 
vendors  include  SNMP  manager  and  agent  programs  as  standard  software  components.  It 
is  unusual  nowadays  to  have  to  write  them. 


1.  The  SNMP  Agent 

SNMP  agents  [28]  (see  Fig  28)  reside  in  the  managed  network  and  communicate 
with  network  management  stations.  SNMP  agents  are  embedded  on  managed  devices. 
SNMP  agents  provide  the  following  functionality: 

•  Implementing  and  maintaining  MIB  objects 

•  Responding  to  management  operations  such  as  requests 

•  Generating  notifications,  both  traps  (unacknowledged)  and  informs 
(acknowledged) 

•  Setting  the  access  policy  for  external  managers 
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•  Implementing  security — SNMPvl  and  SNMPv2c  support 

community-based  security  with  clear-text  passwords;  stronger 
security  (authentication  and  encryption)  is  available  with  SNMPv3. 
SNMPv3  also  provides  an  access  control  framework,  which 
consists  of: 

0  MIB  view — the  set  of  managed  objects  in  an  agent  MIB 
accessible  to  an  SNMP  manager.  This  is  the  manager’s 
client  view  with  respect  to  the  agent. 

0  Access  mode  to  managed  objects — either  READ¬ 
ONLY  or  READWRITE.  A  READ-ONLY  access 
mode  means  that  no  agent  MIB  objects  can  be  written 
by  a  manager.  MIB  views  are  associated  with  specific 
access  modes. 

SNMP  agents  can  be  hosted  on  almost  any  computing  device,  including: 
Windows  NT/2000  machines,  UNIX  hosts,  Novell  NetWare  workstations  and  servers, 
and  many  network  devices,  including  hubs,  routers,  switches,  terminal  servers,  PABXs, 
and  so  on. 

The  agent  listens  on  the  managed  device’s  User  Datagram  Protocol  (UDP)  port 
161  for  the  following  SNMP  message  types: 

•  Get  requests  the  values  of  the  specified  object  instances. 

•  Get-next  requests  the  values  of  the  lexical  successors  of  the 
specified  object  instances. 

•  Get-bulk  requests  the  values  of  portions  of  a  table. 

•  Set  modifies  a  specified  set  of  object  instance  values. 

The  above  messages  either  retrieve  (get)  or  modify  (set)  NE  data  as  defined  in  the 
MIB.  The  agent  uses  UDP  port  162  for  sending  notification  messages  to  a  preconfigured 
IP  address. 

2.  The  SNMP  Manager 

SNMP  managers  (NMSs)  [28],  shown  in  Figure  28,  are  the  entities  that  interact 
with  agents.  Their  primary  functions  are:  getting  and  setting  the  values  of  MIB  object 
instances  on  agents,  receiving  notifications  from  agents,  and  exchanging  messages  with 
other  managers. 
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3.  The  MIB 

A  MIB  [28]  is  simply  a  managed-object  data  description.  The  MIB  defines  the 
syntax  (type  and  structure)  and  semantics  of  the  managed  objects.  SNMP  managers  and 
agents  exchange  managed  object  instances  using  the  SNMP  protocol.  Managed  objects 
may  be  defined  using  what  are  called  textual  conventions.  These  are  essentially 
refinements  of  basic  types  (that  are  very  loosely  analogous  to  programming  language  data 
types  or  even  Java/C++  classes).  Some  of  the  textual  conventions  are: 

•  MacAddress  is  an  IEEE  802  MAC  address. 

•  Truth  Value  is  a  boolean  value  representing  true  ( 1)  or  false  (2). 

•  TestAndlncr  prevents  two  managers  from  simultaneously 
modifying  the  same  object.  Setting  an  object  of  type  TestAndlncr 
to  a  value  other  than  its  current  value  fails.  We  will  see  a  similar 
mechanism  used  in  the  MPLS  tables. 

•  RowStatus  is  a  standard  way  for  adding  and  removing  entries 
from  a  table  (we  will  see  this  object  used  many  times  in  the  MPLS 
configuration  examples). 

•  StorageType  specifies  how  a  row  should  be  stored. 

In  addition  to  using  textual  conventions,  MIB  objects  have  common  attributes. 
Managers  use  these  attributes  in  order  to  manipulate  and  understand  MIB  objects.  Below 
is  a  list  of  common  attributes: 


•  SYNTAX:  This  is  the  object  format — for  example,  Unsigned32 
(an  integer),  TruthValue  (a  Boolean  true  or  false),  and 
SEQUENCE  (a  container  of  other  objects). 

•  MAX-ACCESS:  This  specifies  the  accessibility  of  the  object — for 
example,  read-only  means  that  the  object  can  only  be  read  (but  not 
written)  by  managers. 

•  STATUS:  This  is  the  state  of  support  for  the  object  in  the  MIB — 
for  example,  current  means  that  the  object  is  relevant  and  can  or 
should  be  supported. 

•  DESCRIPTION:  This  is  a  text  description  of  the  object. 
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•  DEFVAL:  This  is  a  default  value  that  the  agent  can  use  when  the 
object  instance  is  first  created. 

•  OBJECT  IDENTIFIER:  This  is  the  unique  name  for  a  MIB 
object,  described  in  the  next  section. 

OIDs  and  Lexigraphical  Ordering.  All  MIB  objects  have  unique  names  called 
object  identifiers  (OIDs)  [28],  An  OID  is  a  sequence  of  32-bit  unsigned  integers  that 
represents  a  node  within  a  tree-based  structure  (with  a  single  root).  Figure  29  shows  the 
basic  construct  of  a  MIB  Tree  with  assigned  OIDs  on  each  node.  Only  an  instance  of  a 
MIB  object  can  be  retrieved  from  an  agent.  An  instance  of  a  MIB  object  is  identified  by 
an  OID  concatenated  with  the  instance  value.  The  instance  value  is  a  sequence  of  one  or 
more  3 2 -bit  unsigned  integers. 


Figure  29.  Sample  MIB  Tree  [31] 


The  order  of  the  OIDs,  called  “lexigraphical  ordering,”  [28]  is  an  important  aspect 
of  SNMP.  All  objects  can  be  traced  from  the  root  in  a  process  called  “walking  the  MIB.” 
During  a  walk,  each  branch  of  the  MIB  tree  is  traversed  from  left  to  right  starting  at  the 
root.  For  example,  the  standard  IP  group  or  table  has  the  OID  1.3.6. 1.2. 1.4,  as  illustrated 
in  Figure  30.  The  IP  group  and  some  of  its  constituent  objects  are  shown  in  this  diagram 
[28]. 
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Figure  30.  The  MIB-II  IP  Group  [28] 


MIBs  are  plain-text  files.  They  are  compiled  into  the  agent  source  code  and 
become  part  of  the  executable  file.  If  a  manager  wants  to  access  some  agent  MIB  objects, 
then  either  the  associated  MIB  module  file  is  needed  or  a  MIB  walk  can  be  attempted. 

Another  important  aspect  of  lexicographic  ordering  is  that  a  manager  can  use  it  to 
“discover”  an  agent  MIB.  This  is  for  that  case  in  which  the  manager  does  not  have  a  copy 
of  the  agent  MIB  and  needs  to  determine  what  objects  the  agent  supports.  The  discovery 
process  consists  of  walking  the  MIB.  It  should  be  noted  that  this  is  not  a  very  good  way 
of  retrieving  agent  data.  It  is  far  better  to  have  the  MIB  details  at  the  manager  side 
because  the  structure  and  meaning  of  the  NE  data  will  then  be  apparent. 

4.  SNMP  Protocol  Data  Unit  (PDU) 

SNMP  managers  and  agents  communicate  using  a  very  simple  messaging 
protocol.  This  is  a  straightforward  fetch  (get),  store  (set),  and  notification  model. 
Managers  retrieve  agent  data  using  get  operations,  and  they  modify  agent  data  using  set 
operations.  When  agents  want  to  communicate  some  important  event,  they  do  so  by 
sending  a  notification  message  to  a  preconfigured  IP  address.  If  the  agent  wants  to 
receive  an  acknowledgment  from  the  manager,  then  it  sends  an  inform  message.  Below  is 
a  brief  overview  of  the  SNMP  message  formats  [32]. 


header 

POU 

Figure  31.  SNMYPvl  message  format  [32] 
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SNMPvl.  SNMPvl  messages  contain  two  parts:  a  message  header  and  a  protocol 
data  unit  (PDU)  [28].  Figure  31  illustrates  the  basic  format  of  a  SNMPvl  message. 
SNMPvl  message  headers  contain  two  fields:  Version  Number  and  Community  Name. 
The  following  descriptions  summarize  these  fields: 

•  Version  number — specifies  the  version  of  SNMP  used. 

•  Community  name — defines  an  access  environment  for  a  group  of 
NMSs.  NMSs  within  the  community  are  said  to  exist  within  the 
same  administrative  domain.  Community  names  serve  as  a  weak 
form  of  authentication  because  devices  that  do  not  know  the  proper 
community  name  are  precluded  from  SNMP  operations. 
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Figure  32.  SNMPvl  PDU  [32] 


SNMPvl  PDUs  contain  a  specific  command  (Get,  Set,  and  others)  and  operands 
that  indicate  the  object  instances  involved  in  the  transaction  (see  Fig  28).  The  SNMPvl 
Get,  GetNext,  Response,  and  Set  PDUs  contain  the  same  fields.  Note  that  SNMPvl  PDU 
fields  are  variable  in  length.  The  following  descriptions  summarize  the  fields  illustrated 
in  Figure  32: 

•  PDU  type — specifies  the  type  of  PDU  transmitted. 

•  Request  ID — associates  SNMP  requests  with  responses. 

•  Error  status — indicates  one  of  a  number  of  errors  and  error  types. 

Only  the  response  operation  sets  this  field.  Other  operations  set 
this  field  to  zero. 

•  Error  index — associates  an  error  with  a  particular  object  instance. 

Only  the  response  operation  sets  this  field.  Other  operations  set 
this  field  to  zero. 

•  Variable  bindings — Serves  as  the  data  field  of  the  SNMPvl  PDU. 

Each  variable  binding  associates  a  particular  object  instance  with 
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its  current  value  (with  the  exception  of  Get  and  GetNext  requests, 
for  which  the  value  is  ignored). 
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Figure  33.  SNMPvl  Trap  PDU  [32] 

Figure  33  shows  the  message  format  for  a  SNMPvl  Trap  message.  The  following 
descriptions  provide  a  summary  of  the  fields: 

•  Enterprise — identifies  the  type  of  managed  object  generating  the 
trap. 

•  Agent  address — provides  the  address  of  the  managed  object 
generating  the  trap. 

•  Generic  trap  type — indicates  one  of  a  number  of  generic  trap 
types. 

•  Specific  trap  code — indicates  one  of  a  number  of  specific  trap 
codes. 

•  Time  stamp — provides  the  amount  of  time  that  has  elapsed 
between  the  last  network  reinitialization  and  generation  of  the  trap. 

•  Variable  bindings — the  data  field  of  the  SNMPvl  Trap  PDU. 

Each  variable  binding  associates  a  particular  object  instance  with 
its  current  value. 

SNMPv2.  The  Get,  GetNext,  and  Set  operations  used  in  SNMPvl  are  exactly  the 
same  as  those  used  in  SNMPv2.  However,  SNMPv2  adds  and  enhances  some  protocol 
operations.  The  SNMPv2  Trap  operation,  for  example,  serves  the  same  function  as  that 
used  in  SNMPvl,  but  it  uses  a  different  message  format  and  is  designed  to  replace  the 
SNMPvl  Trap. 

SNMPv2  also  defines  two  new  protocol  operations:  GetBulk  and  Inform.  The 
GetBulk  operation  is  used  by  the  NMS  to  efficiently  retrieve  large  blocks  of  data,  such  as 
multiple  rows  in  a  table.  GetBulk  fills  a  response  message  with  as  much  of  the  requested 
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data  as  will  fit.  The  Inform  operation  allows  one  NMS  to  send  trap  information  to  another 
NMS  and  to  then  receive  a  response.  In  SNMPv2,  if  the  agent  responding  to  GetBulk 
operations  cannot  provide  values  for  all  the  variables  in  a  list,  it  provides  partial  results. 
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Figure  34.  SNMPv2  GetBulk  PDU  [32] 


The  SNMPv2  GetBulk  PDU  Consists  of  Seven  Fields,  as  shown  in  Figure  34.  The 
following  descriptions  summarize  the  GetBulk  fields: 

•  PDU  type — identifies  the  PDU  as  a  GetBulk  operation. 

•  Request  ID — associates  SNMP  requests  with  responses. 

•  Non  repeaters — specifies  the  number  of  object  instances  in  the 
variable  bindings  field  that  should  be  retrieved  no  more  than  once 
from  the  beginning  of  the  request.  This  field  is  used  when  some  of 
the  instances  are  scalar  objects  with  only  one  variable. 

•  Max  repetitions — defines  the  maximum  number  of  times  that 
other  variables  beyond  those  specified  by  the  Non  repeaters  field 
should  be  retrieved. 

•  Variable  bindings — Serves  as  the  data  field  of  the  SNMPv2  PDU. 

Each  variable  binding  associates  a  particular  object  instance  with 
its  current  value  (with  the  exception  of  Get  and  GetNext  requests, 
for  which  the  value  is  ignored). 

SNMPv3.  SNMPv3  [33]  is  designed  to  be  backward  compatible  with  SNMP 
versions  1  and  2  and  add  security  in  the  form  of  access  control,  authentication,  and 
encryption  to  existing  SNMP  implementations.  As  such,  version  3  is  essentially  version  2 
with  the  addition  of  security  features  and  other  enhancements.  Two  of  the  most 
significant  additions  provided  by  SNMPv3  are  the  User-based  Security  Model  (USM) 
and  View-based  Access  Control  Model  (VACM). 
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The  User-based  Security  Model  (USM)  of  SNMPv3  defines  mechanisms  for 
providing  message-level  security  for  SNMP  implementations.  The  USM  is  designed  to 
protect  against  threats  such  as: 

•  Modification  of  information  -  changing  management  information 
in  transit  between  the  SNMP  manager  and  agent 

•  Masquerade  -  a  non-authorized  user  assuming  the  identity  of  a 
user  authorized  to  perform  management  operations 

•  Message  stream  modification  -  reordering  or  copying  packets  in 
a  management  message  stream  for  malicious  purposes 

•  Disclosure  -  a  non-authorized  user  accessing  a  message  in  transit 
to  learn  information  (e.g.,  passwords)  contained  in  the  stream 

SNMPv3  provides  authentication,  ensures  data  integrity,  and  prevents 
masquerading.  After  a  network  manager  logs  on  to  a  management  station  with  a 
username  and  password,  SNMPv3  authentication  consists  of  applying  MD5  (Message 
Digest  5)  or  SHA  (Secure  Hash  Algorithm)  to  PDU  packets  using  a  key.  The  algorithm 
produces  an  authentication  value  and  places  it  in  the  message.  The  receiver  applies  the 
same  algorithm  with  the  same  key  and  checks  if  its  produced  value  is  the  same  as  the  one 
in  the  message.  The  key  used  for  the  authentication  is  associated  to  the  network 
manager’s  user  name,  which  is  present  within  the  SNMP  message.  The  authentication 
functionality  ensures  that  each  SNMP  message  comes  from  an  authorized  manager  or 
agent  and  that  it  was  not  tampered  with  in  transit. 

The  USM  also  has  mechanisms  for  checking  the  timeliness  of  SNMP  PDU 
delivery  using  synchronization  and  time-window  checking  techniques.  This  helps  detect 
messages  that  have  been  delayed,  which  is  important  because  delay  is  often  an  indicator 
that  packets  have  been  altered. 

The  SNMPv3  View-based  Access  Control  Model  (VACM)  is  designed  to  control 
access  to  management  infonnation  based  on  a  user’s  identity.  The  VACM  allows 
different  access  levels  (read,  write,  notify)  to  be  defined  for  different  users  and  for  each 
piece  of  MIB  infonnation.  After  a  network  manager  authenticates  as  specified  in  the 
USM,  all  SNMP  commands  generated  carry  his/her  credentials.  SNMP  agents  check  the 
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user’s  information  against  a  pre-configured  access  control  database  before  allowing 
access  to  any  MIB  object.  This  gives  network  managers  the  ability  to  define  different 
access  rights  for  different  administrators. 

With  all  the  added  features,  the  SNMPv3  message  format  is  significantly  different 
than  its  predecessors.  The  SNMPv3  message  structure  is  illustrated  in  Figure  35  [33]. 


msgVersion 

msgID 

msgMaxSize 

msgFlags 

msgSecurityModel 

msgAuthoritativeEnginelD 

msgAuthoritativeEngineBoots 

msgAuthoritativeEngineTime 

msgUserName 

msg  Authentication  Parameters 

msgPrivacyParameters 

contextEnginelD 

contextName 

PDU 

Generated/processed 
by  Message  Processing 
Model 


Generated/processed 
by  User  Security 
Model  (USM) 


Scoped  PDU 
(plaintext  or  encrypted) 


Figure  35.  SNMPv3  message  format  with  USM  [33] 


The  first  five  fields  are  generated  by  the  message  processing  model  on  outgoing 
messages  and  processed  by  the  message  processing  model  on  incoming  messages.  The 
next  six  fields  show  security  parameters  used  by  the  security  model,  which  is  invoked  by 
the  message  processing  model  to  provide  security  services.  Finally,  the  PDU,  together 
with  the  contextEnginelD  and  contextName,  constitute  a  scoped  PDU,  used  for  PDU 
processing.  The  first  five  fields  follow: 

•  msgVersion:  Set  to  snmpv3 

•  msgID:  A  unique  identifier  used  between  two  SNMP  entities  to 
coordinate  request  and  response  messages,  and  by  the  message 
processor  to  coordinate  the  processing  of  the  message  by  different 
subsystem  models  within  the  architecture 
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•  msgMaxSize:  Conveys  the  maximum  size  of  a  message  in  octets 
supported  by  the  sender  of  the  message 

•  msgFlags:  An  octet  string  containing  three  flags  in  the  least 
significant  three  bits:  reportableFlag,  privFlag,  authFlag. 

•  msgSecurityModel:  An  identifier  that  indicates  which  security 
model  was  used  by  the  sender  to  prepare  this  message 

To  summarize  the  different  PDUs  for  each  of  the  versions,  Table  2  below  displays 
a  PDU  taxonomy. 


SNMPvi 

SNMPV2C 

SNMPV3 

Response 

PDU 

GetRequest 

GetRequest 

GetRequest 

GetResponse 

GetNextRequest 

GetNextRequest 

GetNextRequest 

GetResponse 

SetRequest 

SetRequest 

SetRequest 

GetResponse 

Trap 

Trap 

Trap 

None 

GetBulkRequest 

GetBulkRequest 

GetResponse 

InfbrmRequest 

In  form  Request 

GetResponse 

Table  2  Protocol  Data  Units  in  the  Different  Versions  of  SNMP  [28] 

C.  COMMON  MANAGEMENT  INFORMATION  PROTOCOL  (CMIP) 

Common  Management  Information  Protocol  (CMIP)  [30]  is  an  Open  Systems 
Interconnection  (OSI)  -based  network  management  protocol  that  supports  information 
exchange  between  network  management  applications  and  management  agents.  Its  design 
is  similar  to  the  Simple  Network  Management  Protocol  (SNMP).  CMIP  was  developed 
and  funded  by  government  and  corporations  to  replace  and  makeup  for  the  deficiencies  in 
SNMP,  thus  improving  the  capabilities  of  network  management  systems. 

CMIP  does  not  specify  the  functionality  of  the  network  management  application, 
it  only  defines  the  information  exchange  mechanism  of  the  managed  objects  and  not  how 
the  information  is  to  be  used  or  interpreted.  It  uses  an  ISO  reliable  connection-oriented 
transport  mechanism  and  has  built  in  security  that  supports  access  control,  authorization 
and  security  logs. 
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Communication  between  the  agents  and  the  management  application  is 
accomplished  through  objects.  These  are  the  same  managed  objects  described  in  the 
SNMP  overview.  The  network  management  application  can  initiate  transactions  with 
management  agents  using  the  following  operations: 

•  ACTION  -  Request  an  action  to  occur  as  defined  by  the  managed 
object. 

•  CANCEL_GET  -  Cancel  an  outstanding  GET  request. 

•  CREATE  -  Create  an  instance  of  a  managed  object. 

•  DELETE  -  Delete  an  instance  of  a  managed  object. 

•  GET  -  Request  the  value  of  a  managed  object  instance. 

•  SET  -  Set  the  value  of  a  managed  object  instance. 

CMIP  was  designed  to  be  more  robust  and  efficient  than  SNMP.  Here  are  some 
of  the  built-in  advantages: 

•  CMIP  is  a  safer  system  as  it  has  built  in  security  that  supports 
authorization,  access  control,  and  security  logs. 

•  CMIP  provides  powerful  capabilities  that  allow  management 
applications  to  accomplish  more  with  a  single  request. 

•  CMIP  provides  better  reporting  of  unusual  network  conditions 

While  CMIP  was  destined  to  replace  SNMP,  it  was  never  fully  embraced  because 
of  several  limitations  that  surfaced  as  a  result  of  the  design  and  universal  acceptance. 
Here  are  a  few  limitations  that  stand  out: 

•  CMIP  is  widely  used  in  the  telecommunication  domain  and 
telecommunication  devices  typically  support  CMIP. 

•  The  CMIP  protocol  is  designed  to  run  on  the  ISO  protocol  stack. 
However,  the  technology  standard  used  today  in  most  LAN 
environments  is  TCP/IP  and  most  LAN  devices  only  support 
SNMP. 

•  CMIP  requires  a  large  amount  of  system  resources;  this  has 
resulted  in  very  few  implementations.  Additionally,  CMIP  is  very 
complex  thus  making  it  difficult  to  program;  therefore  skilled 
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personnel  with  specialized  training  may  be  required  to  deploy, 
maintain  and  operate  a  CMIP  based  network  management  system. 

D.  REMOTE  MONITORING  (RMON) 

In  1995,  the  Internet  Engineering  Task  Force  (IETF)  developed  a  decentralized 
NM  paradigm  called  Remote  Monitoring  (RMON)  [29].  RMON  uses  the  concept  of 
monitors  or  probes  that  provides  analysis  of  network  traffic,  as  opposed  to  the  devices 
with  ordinary  SNMP  agents.  Probe  implementation  can  be  done  as  device  embedded 
applications  or  as  separate  devices.  The  task  of  a  probe  is  to  monitor  the  network  traffic 
at  its  local  region  and  report  anomalies,  in  the  form  of  alarms,  to  its  manager.  By  defining 
alarm  types  and  alarm  thresholds,  the  manager  is  able  to  offload  some  data  gathering  and 
decision-making  (mainly  event  filtering)  to  the  probes.  Furthermore,  the  probes  can  also 
perform  some  data  pre-processing  before  forwarding  them  to  the  manager.  In  general,  the 
earlier  works  towards  distributed  network  management  can  be  considered  as  weak 
distribution.  The  management  tasks  still  reside  heavily  on  the  manager  side,  and  some 
rudimentary  management  duties  are  delegated  to  intermediary  entities,  in  the  form  of 
event  filtering,  notification,  and  data  pre-processing. 

E.  COMMON  OBJECT  REQUEST  BROKER  ARCHITECTURE  (CORBA) 

The  Common  Object  Request  Broker  Architecture  (CORBA)  [30]  is  a 
specification  of  a  standard  architecture  for  object  request  brokers  (ORBs).  An  ORB  is  a 
middleware  technology  that  manages  communication  and  data  exchange  between  objects. 
ORBs  promote  interoperability  of  distributed  object  systems  because  they  enable  users  to 
build  systems  by  piecing  together  objects  from  different  vendors  that  communicate  with 
each  other  via  the  ORB.  Thus,  vendors  can  develop  ORB  products  that  support 
application  portability  and  interoperability  across  different  programming  languages, 
hardware  platforms,  operating  systems,  and  ORB  implementations. 

Using  a  CORBA-compliant  ORB,  a  client  can  transparently  invoke  a  method  on  a 
server  object,  which  can  be  on  the  same  machine  or  across  a  network.  The  ORB 
intercepts  the  call,  and  is  responsible  for  finding  an  object  that  can  implement  the  request, 
passing  it  the  parameters,  invoking  its  method,  and  returning  the  results  of  the  invocation. 
The  client  does  not  have  to  be  aware  of  where  the  object  is  located,  its  programming 
language,  its  operating  system  or  any  other  aspects  that  are  not  part  of  an  object's 
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interface.  The  vision  behind  CORBA  is  that  distributed  systems  are  conceived  and 
implemented  as  distributed  objects.  The  interfaces  to  these  objects  are  described  in  a 
high-level,  architecture-neutral  specification  language  that  also  supports  object-oriented 
design  abstraction. 

The  CORBA  specification  was  developed  by  the  Object  Management  Group 
(OMG),  an  industry  group  with  over  six  hundred  member  companies  representing 
computer  manufacturers,  independent  software  vendors,  and  a  variety  of  govermnent  and 
academic  organizations.  Thus,  CORBA  specifies  an  industry/consortium  standard,  not  a 
formal  standard  in  the  IEEE/ANSI/ISO  sense  of  the  term.  The  OMG  was  established  in 
1988,  and  the  initial  CORBA  specification  emerged  in  1992.  Since  then,  the  CORBA 
specification  has  undergone  significant  revision,  with  the  latest  major  revision  (CORBA 
v2.0)  released  in  July  1996 

Benefits:  CORBA  works  with  SNMP,  CMIP  and  most  major  element 
management  and  network  management  platforms.  To  simplify  the  migration  from  SNMP 
and  CMIP  to  CORBA,  the  TeleManagement  Forum  (TMF)  and  OMG  have  defined 
standard  mappings  between  the  information  models  and  protocols. 

Most  major  carriers  are  already  using  and  promoting  CORBA.  Because  CORBA 
is  pervasive  in  enterprise  information  technology  applications,  it  is  already  heavily  used 
by  telecommunications  service  providers.  In  addition,  CORBA  is  the  means  that  most 
element  management  platforms  and  network  management  platforms  use  to  communicate 
with  each  other.  By  extending  the  use  of  CORBA  to  Network  Element  device 
management,  carriers  can  simplify  their  network  management  systems  and  more  tightly 
integrate  their  management  and  IT  networks. 

Limitations: 

•  Programming  language  support.  IDL  is  a  "least-common 
denominator"  language.  It  does  not  fully  exploit  the  capabilities  of 
programming  languages  to  which  it  is  mapped,  especially  where 
the  definition  of  abstract  types  is  concerned. 

•  Pricing  and  licensing.  The  price  of  ORBs  varies  greatly,  from  a 
few  hundred  to  several  thousand  dollars.  Licensing  schemes  also 
vary. 
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•  Training.  Training  is  essential  for  the  already  experienced 
programmer:  five  days  of  hands-on  training  for  CORBA 
programming  fundamentals  is  suggested. 

•  Security.  CORBA  specifies  only  a  minimal  range  of  security 
mechanisms;  more  ambitious  and  comprehensive  mechanisms 
have  not  yet  been  adopted  by  the  OMG. 

F.  OTHER  TECHNOLOGIES 

The  market  consists  of  many  non-open  standards  NM  solutions  as  well.  The  next 
few  paragraphs  summarize  other  NM  technologies  that  are  proprietary  [28]: 

Microsoft  Systems  Management  Server  (SMS)  allows  system  administrators 
very  flexible  control  of  networks  of  Windows  machines.  Software  applications  deployed 
on  host  machines  can  be  determined  by  remotely  viewing  the  local  Windows  registry  (a 
type  of  configuration  database)  on  each  machine.  This  can  be  very  useful  for  verifying  on 
large  sites  that  software  licenses  have  not  been  exceeded — too  many  users  installing  a 
given  package.  SMS  also  allows  software  to  be  distributed  to  destination  machines.  A 
major  drawback  of  SMS  is  that  it  only  works  on  Windows  machines.  In  other  words,  it  is 
technology  dependent,  unlike  open  standard  technologies  such  as  SNMP  which  is 
technology  neutral.  Therefore,  SMS  is  a  solution  for  select  organizations  that  require  this 
capability. 

Telnet  refers  to  a  menu-based  Command  Line  Interface  (CLI)  style  of 
management.  This  approach  requires  the  management  user  to  connect  to  the  IP  address  of 
a  given  device  using  telnet.  The  device  then  provides  a  text  menu-based  application  with 
which  the  user  interacts.  This  is  useful  and  is  a  widely  adopted  approach  for  device 
management.  It  is  generally  possible  to  use  telnet  to  configure  devices  such  as  laser 
printers,  routers,  switches,  and  terminal  servers.  The  problem  with  it  is  that  menu-based 
management  systems  are  proprietary  by  their  nature  and  don’t  easily  lend  themselves  to 
centralized,  standards-based  management  (as  does  SNMP). 

Serial  link-based  menu  systems  are  very  similar  to  NEs  that  support  telnet.  Just 
the  access  technology  is  different.  Normally,  a  serial  link-based  system  includes  simple 
text  menus  (accessed  via  a  serial  interface)  that  are  used  for  initial  configuration.  Typical 
devices  for  these  facilities  include  small  terminal  servers.  Often,  these  devices  do  not 
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have  an  IP  address,  and  the  user  configures  one  via  the  menu  system.  Connecting  the 
device  to  an  appropriately  configured  PC  serial  port  facilitates  this.  Again,  by  its  nature 
this  is  proprietary. 

Desktop  Management  Interface  (DMI)  was  developed  by  the  Desktop 
Management  Task  Force  and  is  completely  independent  of  SNMP.  Its  purpose  is  the 
management  of  desktop  environments,  and  it  includes  components  similar  to  those  of 
SNMP,  such  as  DMI  clients  (similar  to  SNMP  managers),  DMI  service  providers  (similar 
to  SNMP  agents),  the  DMI  management  infonnation  format  (similar  to  the  MIB),  and 
DMI  events  (similar  to  SNMP  notifications). 
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